Multilingualization - Glossary

From IUCG - Internet Users Contributing Group

Jump to: navigation, search

Contents

[M] - .ARPA

Originally a reference to the US Government agency that managed some of the Internet’s initial development, now a top-level domain used solely for machine-readable use by computers for certain protocols — such as for reverse IP address lookups, and ENUM. The domain is not designed for general registrations. IANA manages .ARPA in conjunction with the Internet Architecture Board.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - .INT

A top-level domain devoted solely to international treaty organisations that have independent legal personality. Such organisations are not governed by the laws of any specific country, rather by mutual agreement between multiple countries. IANA maintains the domain registry for this domain.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - A record

The representation of an IPv4 address in the DNS system.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - AAAA record

The representation of an IPv6 address in the DNS system.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Abjad

A writing system in which only consonants are indicated. The term “abjad” is derived from the first four letters of the traditional order of the Arabic script- alef, beh, jeem, dal. (See Section 6.1, Writing Systems.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Abstract Character

A unit of information used for the organization, control, or representation of textual data. (Unicode Standard, section 3.4, D7)
(source: UNICODE, author: Unicode Standard, section 3.4, D7, text listed to be reviewed and possibly changed).
A unit of information used for the organization, control, or representation of textual data.
(source: UNICODE, author: Unicode Standard, section 3.4, D7, text listed to be reviewed and possibly changed).
  • When representing data, the nature of that data is generally symbolic as opposed to some other kind of data (for example, aural or visual). Examples of such symbolic data include letters, ideographs, digits, punctuation, technical symbols, and dingbats.
    (source: UNICODE, author: Unicode Standard, section 3.4, D7, text listed to be reviewed and possibly changed).
  • An abstract character has no concrete form and should not be confused with a glyph.
    (source: UNICODE, author: Unicode Standard, section 3.4, D7, text listed to be reviewed and possibly changed).
  • An abstract character does not necessarily correspond to what a user thinks of as a “character” and should not be confused with a grapheme.
    (source: UNICODE, author: Unicode Standard, section 3.4, D7, text listed to be reviewed and possibly changed).
  • The abstract characters encoded by the Unicode Standard are known as Unicode abstract characters.
    (source: UNICODE, author: Unicode Standard, section 3.4, D7, text listed to be reviewed and possibly changed).
  • Abstract characters not directly encoded by the Unicode Standard can often be represented by the use of combining character sequences.
    (source: UNICODE, author: Unicode Standard, section 3.4, D7, text listed to be reviewed and possibly changed).
A unit of information used for the organization, control, or representation of textual data. (Unicode Standard, section 3.4, D7)
(source: ICANN, author: WG/VIP, text listed to be reviewed and possibly changed).

[M] - Abstract Character Sequence

An ordered sequence of one or more abstract characters. (See definition D8 in Section Section 3.4, Characters and Encoding.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Abugida

A writing system in which consonants are indicated by the base letters that have an inherent vowel, and in which other vowels are indicated by additional distinguishing marks of some kind modifying the base letter. The term “abugida” is derived from the first four letters of the Ethiopic script in the Semitic order- alf, bet, gaml, dant. (See Section 6.1, Writing Systems.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Accent Mark

A mark placed above, below, or to the side of a character to alter its phonetic value. (See also diacritic.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ACE

see A-label.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ACE Prefix

The "ACE prefix" is defined in this document to be a string of ASCII characters, "xn--", that appears at the beginning of every A-label. "ACE" stands for "ASCII-Compatible Encoding".
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Acrophonic

Denoting letters or numbers by the first letter of their name. For example, the Greek acrophonic numerals are variant forms of such initial letters.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Adaptation

[RFC 1958] documents the way the Internet is to be designed in order to adapt to evolution. This is the Principle of constant change: perhaps as the only principle of the Internet that should survive indefinitely.
(source: RFC 1958, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - Aksara

(1) In Sanskrit grammar, the term for “letter” in general, as opposed to consonant (vyanjana) or vowel (svara). Derived from the first and last letters of the traditional ordering of Sanskrit letters—“a” and “ksha”. (2) More generally, in Indic writing systems, aksara refers to a “syllable,” consisting of a consonant plus vowel sequence, where the vowel may or may not be the inherent vowel of the consonant letter. When multiple consonants are involved, the aksara represents the entire orthographic syllable, which can include two or more leading consonants that may be visually presented in conjunct forms; in such cases, the aksara may not be identical to the phonological syllable.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - A-label

An ASCII-Compatible Encoding form of an IDNA-valid string. It must be a complete label: IDNA is defined for labels, not for parts of them and not for complete domain names. This means, by definition, that every A-label will begin with the IDNA ACE prefix, "xn--", followed by a string that is a valid output of the Punycode algorithm (RFC 3492) and hence a maximum of 59 ASCII characters in length. The prefix and string together must conform to all requirements for a label that can be stored in the DNS including conformance to the rules for LDH labels (See RFC 5390, Section RFC 2.3.1). If and only if a string meeting the above requirements can be decoded into a U-label is it an A-label. (RFC 5890)
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).
An "A-label" is the ASCII-Compatible Encoding (ACE) form of an IDNA-valid string. It must be a complete label: IDNA is defined for labels, not for parts of them and not for complete domain names. This means, by definition, that every A-label will begin with the IDNA ACE prefix, "xn--" (see Section 2.3.2.5), followed by a string that is a valid output of the Punycode algorithm [RFC3492] and hence a maximum of 59 ASCII characters in length. The prefix and string together must conform to all requirements for a label that can be stored in the DNS including conformance to the rules for LDH labels. If and only if a string meeting the above requirements can be decoded into a U-label is it an A-label.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The ASCII-compatible encoded (ACE) representation of an internationalised domain name, i.e. how it is transmitted internally within the DNS protocol. A-labels always commence the with the prefix “xn--”. Contrast with U-label.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - A-label form

A-label form uses a compressed, ASCII-compatible encoding (an "ACE" in IDNA and other terminology) produced by an algorithm called Punycode. U-labels and A-labels are duals of each other: transformations from one to the other do not lose information. The transformation mechanisms are specified in the IDNA Protocol document [RFC5891].
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).

[M] - algorithm

(I) A finite set of step-by-step instructions for a problem- solving or computation procedure, especially one that can be implemented by a computer. (See: cryptographic algorithm.)
(source: , author: , text listed to be reviewed and possibly changed).
A term used in a broad sense in the Unicode Standard, to mean the logical description of a process used to achieve a specified result. This does not require the actual procedure described in the algorithm to be followed; any implementation is conformant as long as the results are the same.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - alias

A name, usually short and easy to remember, that is translated into another name, usually long and difficult to remember.
(source: , author: , text listed to be reviewed and possibly changed).
(I) A name that an entity uses in place of its real name, usually for the purpose of either anonymity or masquerade.
(source: , author: , text listed to be reviewed and possibly changed).
alternative given name to a domain name (CNAME) or to an IP address (e.g. Host file). RFC 1034: "CNAME identifies the canonical name of an alias". RFC 2181: " It has been traditional to refer to the label of a CNAME record as "a CNAME". This is unfortunate, as "CNAME" is an abbreviation of "canonical name", and the label of a CNAME record is most certainly not a canonical name. It is, however, an entrenched usage. Care must therefore be taken to be very clear whether the label, or the value (the canonical name) of a CNAME resource record is intended. In this document, the label of a CNAME resource record will always be referred to as an alias."
(source: RFC 1034, author: P. Mockapetris, text listed to be reviewed and possibly changed).

[M] - Aliased name:

A domain name that has been aliased with one or more other names under the concept of Name Aliasing. Note that aliased names may also be delegated.
(source: ICANN, author: WG/VIP, text listed to be reviewed and possibly changed).

[M] - Aliasing

The only difference is in the name.

[M] - Allocation

In a DNS context, the first step on the way to Delegation. A registry (the parent side) is managing a zone. The registry makes an administrative association between a string and some entity that requests the string, making the string a label inside the zone, and a candidate for delegation. Allocation does not affect the DNS itself at all.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).

[M] - Alphabet

A writing system in which both consonants and vowels are indicated. The term “alphabet” is derived from the first two letters of the Greek script- alpha, beta. (See Section 6.1, Writing Systems.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - alphabetic

An informative Unicode property. Characters that are the primary units of alphabets and/or syllabaries, whether combining or noncombining. This includes composite characters that are canonical equivalents to a combining character sequence of an alphabetic base character plus one or more combining characters: letter digraphs; contextual variant of alphabetic characters; ligatures of alphabetic characters; contextual variants of ligatures; modifier letters; letterlike symbols that are compatibility equivalents of single alphabetic letters; and miscellaneous letter elements. <UNICODE>
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Alphabetic Property

Informative property of the primary units of alphabets and/or syllabaries. (See Section 4.10, Letters, Alphabetic, and Ideographic.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Alphabetic Sorting

(See collation.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Annotation

The association of secondary textual content with a point or range of the primary text. (The value of a particular annotation is considered to be a part of the “content” of the text. Typical examples include glossing, citations, exemplification, Japanese yomi, and so on.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[R] - ANSI

(1) The American National Standards Institute. (2) The Microsoft collective name for all Windows code pages. Sometimes used specifically for code page 1252, which is a superset of ISO/IEC 8859-1.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - APIPA

A subcategory of private IP address. See Private IP Addresses.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Apparatus Criticus

Collection of conventions used by editors to annotate and comment on text.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Arabic Digits

The term "Arabic digits" may mean either the digits in the Arabic script (see Arabic-Indic digits) or the ordinary ASCII digits in contrast to Roman numerals (see European digits). When the term "Arabic digits" is used in Unicode specifications, it means Arabic-Indic digits.
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).
Forms of decimal digits used in most parts of the Arabic world (for instance, U/0660, U/0661, U/0662, U/0663). Although European digits (1, 2, 3,…) derive historically from these forms, they are visually distinct and are coded separately. (Arabic digits are sometimes called Indic numerals; however, this nomenclature leads to confusion with the digits currently used with the scripts of India.) Arabic digits are referred to as Arabic-Indic digits in the Unicode Standard. Variant forms of Arabic digits used chiefly in Iran and Pakistan are referred to as Eastern Arabic-Indic digits. (See Section 8.2, Arabic.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Arabic-Indic Digits

Forms of decimal digits used in most parts of the Arabic world (for instance, U+0660, U+0661, U+0662, U+0663). Although European digits (1, 2, 3,…) derive historically from these forms, they are visually distinct and are coded separately. (Arabic-Indic digits are sometimes called Indic numerals; however, this nomenclature leads to confusion with the digits currently used with the scripts of India.) Variant forms of Arabic-Indic digits used chiefly in Iran and Pakistan are referred to as Eastern Arabic-Indic digits. (See Section 8.2, Arabic.)
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - AREG

A subset of IRIS for performing registration lookups on IP addresses.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[R] - ASCII

(1) The American Standard Code for Information Interchange, a 7-bit coded character set for information interchange. It is the U.S. national variant of ISO/IEC 646 and is formally the U.S. standard ANSI X3.4. It was proposed by ANSI in 1963 and finalized in 1968. (2) The set of 128 Unicode characters from U/0000 to U/007F, including control codes as well as graphic characters. (3) ASCII has been incorrectly used to refer to various 8-bit character encodings that include ASCII characters in the first 128 code points.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ASCII (American Standard Code for Information Interchange)

The standard for transmitting English (or “Latin”) letters over the Internet. DNS was originally limited to only Latin characters because it uses ASCII as its encoding format, although this has been expanded using Internationalised Domain Names for Applications.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).


[M] - ASCII-compatible encoding (ACE)

Starting in 1996, many ASCII-compatible encoding schemes (which are actually transfer encoding syntaxes) have been proposed as possible solutions for internationalizing host names and some other purposes. Their goal is to be able to encode any string of ISO/IEC 10646 characters using the preferred syntax for domain names (as described in STD 13). At the time of this writing, only the ACE encoding produced by Punycode [RFC3492] has become an IETF standard.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The choice of ACE forms to internationalize legacy protocols must be made with care as it can cause some difficult side effects [RFC6055].
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - ASN, AS number

see Autonomous System Number.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Assigned Character

Synonym for assigned to an abstract character. This refers to graphic, format, control, and private-use characters that have been encoded in the Unicode Standard. (See Section 2.4, Code Points and Characters.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Assigned Code Point

(See designated code point.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Assigned Code Point

A mapping from an Abstract Character to a particular Code Point in the code space. See Unicode Standard, section 2.4. Not to be confused with Valid Code Point.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).

[M] - Atomic Character

A character that is not decomposable. (See decomposable character.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - authoritative name server

a domain name server configured to host the official record of the contents of a DNS zone. Each domain name must have a set of these so computers on the Internet can find out the contents of that domain. The set of authoritative name servers for any given domain must be configured as NS records in the parent domain.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[R] - authority

see authoritative name server.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Automatic Private IP Addresses (APIPA)

A subcategory of private IP address that is automatically assigned, as per RFC 3927. See also Private IP addresses.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - autonomous system number (AS number, ASN)

A number used by Internet routing protocols to uniquely identify the routing policy of a particular network operator. They can be considered to be similar to a ‘postcode’ used for physical mail. They are allocated to network operators via regional Internet registries.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Base Character

Any graphic character except for those with the General Category of Combining Mark (M). (See definition D51 in Section 3.6, Combination.) In a combining character sequence, the base character is the initial character, which the combining marks are applied to.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Base64

Base64 is a transfer encoding syntax that allows binary data to be represented by the ASCII characters A through Z, a through z, 0 through 9, +, /, and =. It is defined in [RFC2045]. <NONE> quoted printable Quoted printable is a transfer encoding syntax that allows strings that have non-ASCII characters mixed in with mostly ASCII printable characters to be somewhat human readable. It is described in [RFC2047]. <NONE> The quoted printable syntax is generally considered to be a failure at being readable. It is jokingly referred to as "quoted unreadable".
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Basic Multilingual Plane

Plane 0, abbreviated as BMP.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Basic Multilingual Plane (BMP)

Basic Multilingual Plane (BMP) The BMP is composed of the first 2^16 code points in ISO/IEC 10646 and contains almost all characters in contemporary use. The BMP is also called "Plane 0".
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Bicameral

A script that distinguishes between two cases. (See case.) Most often used in the context of Latin-based alphabets of Europe and elsewhere in the world.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - BIDI

Abbreviation of bidirectional, in reference to mixed left-to-right and right-to-left text.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - bidirectional display

The process or result of mixing left-to-right oriented text and right-to-left oriented text in a single line is called bidirectional display, often abbreviated as "bidi". <UNICODE> Most of the world's written languages are displayed left-to-right.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
However, many widely-used written languages such as ones based on the Hebrew or Arabic scripts are displayed primarily right-to-left (numerals are a common exception in the modern scripts). Right- to-left text often confuses protocol writers because they have to keep thinking in terms of the order of characters in a string in memory, an order that might be different from what they see on the screen. (Note that some languages are written both horizontally and vertically and that some historical ones use other display orderings.) Further, bidirectional text can cause confusion because there are formatting characters in ISO/IEC 10646 that cause the order of display of text to change. These explicit formatting characters change the display regardless of the implicit left-to-right or right-to-left properties of characters. Text that might contain those characters typically requires careful processing before being sorted or compared for equality.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
It is common to see strings with text in both directions, such as strings that include both text and numbers, or strings that contain a mixture of scripts.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Unicode has a long and incredibly detailed algorithm for displaying bidirectional text [UAX9].
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The process or result of mixing left-to-right text and right-to-left text in a single line. (See Unicode Standard Annex #9, “Unicode Bidirectional Algorithm.”)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[R] - big-endian

A computer architecture that stores multiple-byte numerical values with the most significant byte (MSB) values first.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Binary Files

Files containing nontextual information.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[R] - block

A grouping of related characters within the Unicode encoding space. A block may contain unassigned code points, which are reserved.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Blocked (Forbidden) name:

A string not allowed to be registered (i.e., allocated) to anyone in a registry.
(source: ICANN, author: WG/VIP, text listed to be reviewed and possibly changed).

[M] - BMP

Acronym for Basic Multilingual Plane.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - BMP Character

A Unicode encoded character having a BMP code point. (See supplementary character.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - BMP Code Point

A Unicode code point between U/0000 and U/FFFF. (See supplementary code point.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - BNF

Acronym for Backus-Naur Form, a formal meta-syntax for describing context-free syntaxes. (For details, see Appendix A, Notational Conventions.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - BOCU-1

Acronym for Binary Ordered Compression for Unicode. A Unicode compression scheme that is MIME-compatible (directly usable for e-mail) and preserves binary order, which is useful for databases and sorted lists.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - BOM

Acronym for byte order mark.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Bopomofo

An alphabetic script used primarily in the Republic of China (Taiwan) to write the sounds of Mandarin Chinese and some other dialects. Each symbol corresponds to either the syllable-initial or syllable-final sounds; it is therefore a subsyllabic script in its primary usage. The name is derived from the names of its first four elements. More properly known as zhuyin zimu or zhuyin fuhao in Mandarin Chinese.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Boustrophedon

A pattern of writing seen in some ancient manuscripts and inscriptions, where alternate lines of text are laid out in opposite directions, and where right-to-left lines generally use glyphs mirrored from their left-to-right forms. Literally, “as the ox turns,” referring to the plowing of a field.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Braille

A writing system using a series of raised dots to be read with the fingers by people who are blind or whose eyesight is not sufficient for reading printed material. (See Section15.10, Braille.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Braille Pattern

One of the 64 (for six-dot Braille) or 256 (for eight-dot Braille) possible tangible dot combinations.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - bundle

see variant bundle.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Byte Order Mark

The Unicode character U/FEFF when used to indicate the byte order of a text. (See Section 2.13, Special Characters and Noncharacters, and Section 16.8, Specials.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Byte Serialization

The order of a series of bytes determined by a computer architecture.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Byte-Swapped

Reversal of the order of a sequence of bytes.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - caching name server

a domain name server that remembers the results of previous lookups in a cache to speed future lookups. Usually in combination with recursive name server functionality.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - caching resolver

the combination of a recursive name server and a caching name server.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Canonical

(1) Conforming to the general rules for encoding—that is, not compressed, compacted, or in any other form specified by a higher protocol. (2) Characteristic of a normative mapping and form of equivalence specified in Chapter 3, Conformance.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Canonical Composition

A step in the algorithm for Unicode Normalization Forms, during which decomposed sequences are replaced by primary composites, where possible. (See definition D115 in Section 3.11, Normalization Forms.)
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Canonical Decomposable Character

A character that is not identical to its canonical decomposition. (See definition D69 in Section 3.7. Decomposition.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Canonical Decomposition

Mapping to an inherently equivalent sequence—for example, mapping ä to a / combining umlaut. (For a full, formal definition, see definition D68 in Section 3.7. Decomposition.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Canonical Equivalent

Two character sequences are said to be canonical equivalents if their full canonical decompositions are identical. (See definition D70 in Section 3.7. Decomposition.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - CANONICALIZATION

Reducing the unnecessary differences.
(source: ALFA, author: JFC Morfin, text listed to be reviewed and possibly changed).

[M] - Cantillation Mark

A mark that is used to indicate how a text is to be chanted or sung.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Capital Letter

Synonym for uppercase letter. (See case.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - case

Case is the feature of certain alphabets where the letters have two (or occasionally more) distinct forms. These variants, which may differ markedly in shape and size, are called the uppercase letter (also known as capital or majuscule) and the lowercase letter (also known as small or minuscule). Case mapping is the association of the uppercase and lowercase forms of a letter.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
<UNICODE> There is usually (but not always) a one-to-one mapping between the same letter in the two cases. However, there are many examples of characters which exist in one case but for which there is no corresponding character in the other case or for which there is a special mapping rule, such as the Turkish dotless "i", some Greek characters with modifiers, and characters like the German Sharp S (Eszett) and Greek Final Sigma that traditionally do not have uppercase forms. Case mapping can even be dependent on locale or language. Converting text to have only a single case, primarily for comparison purposes, is called "case folding". Because of the various unusual cases, case mapping can be quite controversial and some case folding algorithms even more so.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
(1) Feature of certain alphabets where the letters have two distinct forms. These variants, which may differ markedly in shape and size, are called the uppercase letter (also known as capital or majuscule) and the lowercase letter (also known as small or minuscule). (2) Normative property of characters, consisting of uppercase, lowercase, and titlecase (Lu, Ll, and Lt). (See Section 4.2, Case—Normative.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Case Mapping

The association of the uppercase, lowercase, and titlecase forms of a letter. (See Section 5.18, Case Mappings.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Case-Ignorable

A character C is defined to be case-ignorable if C has the value MidLetter or the value MidNumLet for the Word_Break property or its General_Category is one of Nonspacing_Mark (Mn), Enclosing_Mark (Me), Format (Cf), Modifier_Letter (Lm), or Modifier_Symbol (Sk). (See definition D121 in Unicode 5.1.0.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Case-Ignorable Sequence

A sequence of zero or more case-ignorable characters. (See definition D122 in Section 3.13, Default Case Algorithms.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ccNSO

see Country-code Name Supporting Organisation.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - CCS

(1) Acronym for coded character set. (2) Also used as an acronym for combining character sequence.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ccTLD

see country-code top-level domain.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Cedilla

A mark originally placed beneath the letter c in French, Portuguese, and Spanish to indicate that the letter is to be pronounced as an s, as in façade. Obsolete Spanish diminutive of ceda, the letter z.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - CEF

Acronym for character encoding form.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - CES

Acronym for character encoding scheme.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - chain of trust

A property of an Internet resource where the delegation of responsibility from one party to another can be verified because there is a chain of custody that can be cryptographically verified using electronic certificates. To verify this chain of trust, the chain must be valid and unbroken all the way from a known trust anchor to the resource in question.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Character

(1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape, rather than a specific shape (see also glyph), though in code tables some form of visual representation is essential for the reader’s understanding. (2) Synonym for abstract character. (3) The basic unit of encoding for the Unicode character encoding. (4) The English name for the ideographic written elements of Chinese origin. [See ideograph (2).]
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Character Block

(See block.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Character Class

A set of characters sharing a particular set of properties.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Character Encoding Form

A character encoding form is a mapping from a coded character set (CCS) to the actual code units used to represent the data.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Mapping from a character set definition to the actual code units used to represent the data.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).
A character encoding form plus byte serialization. There are seven character encoding schemes in Unicode- UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, and UTF-32LE.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - character encoding scheme

A character encoding scheme (CES) is a character encoding form plus byte serialization. There are many character encoding schemes in Unicode, such as UTF-8 and UTF-16BE. Some CESs are associated with a single CCS; for example, UTF-8 [RFC3629] applies only to the identical CCSs of ISO/IEC 10646 and Unicode. Other CESs, such as ISO 2022, are associated with many CCSs.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Character Name

A unique string used to identify each abstract character encoded in the standard. (See definition D4 in Section 3.3, Semantics.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Character Name Alias

An additional unique string identifier, other than the character name, associated with an encoded character in the standard. (See definition D5 in Section 3.3, Semantics.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Character Properties

A set of property names and property values associated with individual characters. (See Chapter 4, Character Properties.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Character Repertoire

The collection of characters included in a character set.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Character Sequence

Synonym for abstract character sequence.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Character Set

A collection of elements used to represent textual information.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Character Variant

In a Language Variant Table, a second list of Code Points corresponding to each Valid Code Point and providing possible substitutions for it. Unlike the Preferred Variants, substitutions based on Character Variants are normally reserved but not actually registered (or "activated"). Character Variants appear in column 3 of the Language Variant Table. The term "Code Point Variants" is used interchangeably with this term. (RFC 3743)
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).
Character Variant: In a Language Variant Table, a second list of Code Points corresponding to each Valid Code Point and providing possible substitutions for it. Unlike the Preferred Variants, substitutions based on Character Variants are normally reserved but not actually registered (or "activated"). Character Variants appear in column 3 of the Language Variant Table. The term "Code Point Variants" is used interchangeably with this term.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - Character Variant Label

A U-label generated from a Fundamental Label by use of Character Variants. The Character Variant Label must contain at least one Character Variant, but need not contain all the Character Variants possible for the Fundamental Label. This definition differs from that in RFC 3743 by specifying “U-label” rather than “label”.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).
Character Variant Label: A label generated by use of Character Variants.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - charset

A charset is a method of mapping a sequence of octets to a sequence of abstract characters. A charset is, in effect, a combination of one or more CCSs with a CES. Charset names are registered by the IANA according to procedures documented in [RFC2978]. Many protocol definitions use the term "character set" in their descriptions. The terms "charset" or "character encoding scheme" and "coded character set" are strongly preferred over the term "character set" because "character set" has other definitions in other contexts and this can be confusing.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
(See coded character set.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - charset identification

Specification of the charset used for a string of text. <NONE> Protocols that allow more than one charset to be used in the same place should require that the text be identified with the appropriate charset. Without this identification, a program looking at the text cannot definitively discern the charset of the text. Charset identification is also called "charset tagging".
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Chillu

Abbreviation for chilaaksharam (singular) (cillakṣaram). Refers to any of a set of sonorant consonants in Malayalam, when appearing in syllable-final position with no inherent vowel.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Choseong

A sequence of one or more leading consonants in Korean.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Chu Hán

The name for Han characters used in Vietnam; derived from hànzì.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Chu Nôm

A demotic script of Vietnam developed from components of Han characters. Its creators used methods similar to those used by the Chinese in creating Han characters.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - CJK

Acronym for Chinese, Japanese, and Korean. A variant, CJKV, means Chinese, Japanese, Korean, and Vietnamese.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - CJK Characters

CJK Characters: CJK characters are characters commonly used in the Chinese, Japanese, or Korean languages, including but not limited to those defined in the Unicode Standard as ASCII (U+0020 to U+007F), Han ideographs (U+3400 to U+9FAF and U+20000 to U+2A6DF), Bopomofo (U+3100 to U+312F and U+31A0 to U+31BF), Kana (U+3040 to U+30FF), Jamo (U+1100 to 11FF and U+3130 to U+318F), Hangul (U+AC00 to U+D7AF and U+3130 to U+318F), and the respective compatibility forms. The particular characters that are permitted in a given zone are specified in the Language Variant Table(s) for that zone.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - CJK characters and Han characters

The ideographic characters used in Chinese, Japanese, Korean, and traditional Vietnamese writing systems are often called 'CJK characters' after the initial letters of the language names in English. They are also called "Han characters", after the term in Chinese that is often used for these characters. <NONE> Note that Han characters do not include the phonetic characters used in the Japanese and Korean languages. Users of the term "CJK characters" may or may not assume those additional characters are included.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
In ISO/IEC 10646, the Han characters were "unified", meaning that each set of Han characters from Japanese, Chinese, and/or Korean that had the same origin was assigned a single code point. The positive result of this was that many fewer code points were needed to represent Han; the negative result of this was that characters that people who write the three languages think are different have the same code point. There is a great deal of disagreement on the nature, the origin, and the severity of the problems caused by Han unification.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - clandestine redelegation

The act of performing a redelegation by changing the practical details (i.e. the contact details and/or name server records) of a top-level domain subversively, rather than applying for a redelegation using proper procedure.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - CLDR

(See Unicode Common Locale Data Repository.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Code Page

A coded character set, often referring to a coded character set used by a personal computer—for example, PC code page 437, the default coded character set used by the U.S. English version of the DOS operating system.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Code Point

A value in the Unicode code space. The meaning here is restricted to meaning D10 in the Unicode Standard, section 3.4.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).
A value in the codespace of a repertoire. For all common repertoires developed in recent years, code point values are integers (code points for ASCII and its immediate descendants were defined in terms of column and row positions of a table).
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
(1) Any value in the Unicode codespace; that is, the range of integers from 0 to 10FFFF16. (See definition D10 in Section 3.4, Characters and Encoding.) (2) A value, or position, for a character, in any coded character set.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Code Point Type

Any of the seven fundamental classes of code points in the standard: Graphic, Format, Control, Private-Use, Surrogate, Noncharacter, Reserved. (See definition D10a in Section 3.4, Characters and Encoding.)
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Code Position

Synonym for code point. Used in ISO character encoding standards.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Code Set

[M] - code table

A code table is a table showing the characters allocated to the octets in a code. <ISOIEC10646> Code tables are also commonly called "code charts".
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Code Unit

The minimal bit combination that can represent a unit of encoded text for processing or interchange. The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form. (See definition D77 in Section 3.9, Unicode Encoding Forms.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Code Value

Obsolete synonym for code unit.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - coded character

A character together with its coded representation. coded character set A coded character set (CCS) is a set of unambiguous rules that establishes a character set and the relationship between the characters of the set and their coded representation.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Coded Character Representation

Synonym for coded character sequence.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Coded Character Sequence

An ordered sequence of one or more code points. Normally, this consists of a sequence of encoded characters, but it may also include noncharacters or reserved code points. (See definition D12 in Section 3.4, Characters and Encoding.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Coded Character Set

A character set in which each character is assigned a numeric code point. Frequently abbreviated as character set, charset, or code set; the acronym CCS is also used.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Codespace

(1) A range of numerical values available for encoding characters. (2) For the Unicode Standard, a range of integers from 0 to 10FFFF16. (See definition D9 in Section 3.4, Characters and Encoding.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Collation

The process of ordering units of textual information. Collation is usually specific to a particular language. Also known as alphabetizing or alphabetic sorting. Unicode Technical Standard #10, “Unicode Collation Algorithm,\\\"\\\" defines a complete, unambiguous, specified ordering for all characters in the Unicode Standard.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - combining character

A member of an identified subset of the coded character set of ISO/IEC 10646 intended for combination with the preceding non- combining graphic character, or with a sequence of combining characters preceded by a non-combining character. Combining characters are inherently non-spacing. <ISOIEC10646> composite sequence or combining character sequemce A sequence of graphic characters consisting of a non-combining character followed by one or more combining characters. A graphic symbol for a composite sequence generally consists of the combination of the graphic symbols of each character in the sequence. The Unicode Standard often uses the term "combining character sequence" to refer to composite sequences. A composite sequence is not a character and therefore is not a member of the repertoire of ISO/IEC 10646. <ISOIEC10646> However, Unicode now assigns names to some such sequences especially when the names are required to match terminology in other standards [UAX34].
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
In some CCSs, some characters consist of combinations of other characters. For example, the letter "a with acute" might be a combination of the two characters "a" and "combining acute", or it might be a combination of the three characters "a", a non- destructive backspace, and an acute. In the same or other CCSs, it might be available as a single code point. The rules for combining two or more characters are called "composition rules", and the rules for taking apart a character into other characters is called "decomposition rules". The results of composition is called a "precomposed character"; the results of decomposition is called a "decomposed character".
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Combining Character Sequence

A maximal character sequence consisting of either a base character followed by a sequence of one or more characters where each is a combining character, zero width joiner, or zero width non-joiner; or a sequence of one or more characters where each is a combining character, zero width joiner, or zero width non-joiner. (See definition D56 in Section 3.6, Combination.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Combining Class

A numeric value in the range 0..255 given to each Unicode code point, formally defined as the property Canonical_Combining_Class. (See definition D104 in Section 3.11, Canonical Ordering Behavior.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Combining Mark

A commonly used synonym for combining character.
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Comparison of Unicode strings

Comparison of Unicode strings is not as easy as comparing ASCII strings. First, there are a multitude of ways to represent a string of Unicode characters. Second, in many languages and scripts, the actual definition of "same" is very context-dependent. Because of this, comparison of two Unicode strings must take into account how the Unicode strings are encoded. Regardless of the encoding,however, comparison cannot simply be done by comparing the encoded Unicode strings byte by byte. The only time that is possible is when the strings are both mapped into some canonical form and encoded same way.
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).

[M] - Compatibility

(1) Consistency with existing practice or preexisting character encoding standards. (2) Characteristic of a normative mapping and form of equivalence specified in Section 3.7. Decomposition.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Compatibility Character

A character that would not have been encoded except for compatibility and round-trip convertibility with other standards. (See Section 2.3, Compatibility Characters.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - compatibility character or compatibility variant

A graphic character included as a coded character of ISO/IEC 10646 primarily for compatibility with existing coded character sets.

ISOIEC10646)> The Unicode definition of compatibility charter also includes characters that have been incorporated for other reasons. Their list includes several separate groups of characters included for compatibility purposes: halfwidth and fullwidth characters used with East Asian scripts, Arabic contextual forms (e.g., initial or final forms), some ligatures, deprecated formatting characters, variant forms of characters (or even copies of them) for particular uses (e.g., phonetic or mathematical applications), font variations, CJK compatibility ideographs, and so on. For additional information and the separate term "compatibility decomposable character", see the Unicode standard.
For example, U+FF01 (FULLWIDTH EXCLAMATION MARK) was included for compatibility with Asian character sets that include full-width and half-width ASCII characters.

Some efforts in the IETF have concluded that it would be useful to support mapping of some groups of compatibility equivalents and not others (e.g., supporting or mapping width variations while preserving or rejecting mathematical variations).
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Compatibility Composite Character

Synonym for compatibility decomposable character.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).
A character whose compatibility decomposition is not identical to its canonical decomposition. (See definition D66 in Section 3.7. Decomposition.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Compatibility Decomposable Character

[M] - Compatibility Decomposition

Mapping to a roughly equivalent sequence that may differ in style. (For a full, formal definition, see definition D65 in Section 3.7. Decomposition.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Compatibility Decompposition

Two character sequences are said to be compatibility equivalents if their full compatibility decompositions are identical. (See definition D67 in Section 3.7. Decomposition.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Compatibility Equivalent

[M] - Compatibility Precomposed Character

[M] - Compatibility Variant

A character that generally can be remapped to another character without loss of information other than formatting.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Composite Character

[M] - Composite Character Sequence

(See combining character sequence.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Composite-character variants:

Abstract Characters that do not have a Code Point assigned, but can be represented by multiple Code Points.
(source: ICANN, author: WG/VIP, text listed to be reviewed and possibly changed).

[M] - Composition Exclusion

Composition Exclusion. A Canonical Decomposable Character which has the property value Composition_Exclusion=True. (Used in the definition of Unicode Normalization Forms.) (See definition D112 in Section 3.11, Normalization Forms.)
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Conformance

Adherence to a specified set of criteria for use of a standard. (See Chapter 3, Conformance.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Conjunct Form

A ligated form representing a consonant conjunct.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - CONSISTENT

The differences do not conflict.
(source: ALFA, author: JFC Morfin, text listed to be reviewed and possibly changed).

[M] - Consonant Cluster

A sequence of two or more adjacent consonantal letterforms, consisting of a sequence of one or more dead consonants followed by a normal, live consonant letter. A consonant conjunct may be ligated into a single conjunct form, or it may be represented by graphically separable parts, such as subscripted forms of the consonant letters. Consonant conjuncts are associated with the Brahmi family of Indic scripts. (See Section 9.1, Devanagari.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Consonant Conjunct

[M] - Contextual Variant

[M] - Contributory Property

A simple property defined merely to make the statement of a rule defining a derived property more compact or general. (See definition D35a in Section 3.5, Properties.)
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Control Codes

The 65 characters in the ranges U/0000..U/001F and U/007F..U/009F. Also known as control characters.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Core Specification

A core part of the Unicode Standard. Formally, a version of the Unicode Standard is defined by an edition of the core specification, The Unicode Standard, together with the Code Charts, Unicode Standard Annexes and the Unicode Character Database.
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Country-code Name Supporting Organisation (ccNSO)

A component of ICANN’s policy development forums (a “constituency”) that is responsible for discussing and developing policy relating to how ccTLDs are delegated.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Country-code top-level domain (ccTLD)

A class of top-level domains only assignable to represent countries listed in the ISO 3166-1 standard. At present these are two-letter codes like “.UK”, “.DE” etc., however in the future it is expected there will be non-Latin equivalents also available. Much of the policy-making for individual country-code top-level domains is vested with a local sponsoring organisation, as opposed to other top-level domains where ICANN sets the policy. It is a requirement that ccTLDs are operated within the country they are designated so appropriate local laws, governments etc. have a say in how the domain is run.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - CRISP

see Cross-Registry Information Service Protocol.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Cross-Registry Information Service Protocol (CRISP)

The name of the working group at the IETF that developed the Internet Registry Information Service (IRIS), a next-generation WHOIS protocol replacement.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Cursive

Writing where the letters of a word are connected.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Cybship/Cybcraft

The cybship is an image that is taken of a person with his or her digital environment, just as there are seaships or spaceships. Just as there are watercrafts or spacecrafts, a cybcraft is an image that is taken for a person with the sole online digital processor that he/she is currently using.
(source: ALFA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).
Using this paradigm matches the WSIS (World Summit on the Information Society) demand for a "people centered/à caractère humain/centrada en la persona" information society. It helps in thinking about the real nature of the world digital ecosystem, of which the Internet is a major part, as being a complex (i.e. intricated) network obeying the Cosmological principle.
(source: ALFA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).
As being the "kubernetes" (commanding officer, pilot, helmsman, lead user, etc. along Plato's original concept) of his or her cybship, each person is to assume his/her e-empowerment and decide about the short-term operations, mid-term governance, and long-term adminance of his/her cybship: people are the free masters and commanders of their digital experience and the center of their own "e-world". That is, if they have the liberty to enact the 1st law of cybernetics: "The unit within the system with the most behavioral responses available to it controls the system"; the danger for them is some other (commercial, financial, technical, political, etc.) environmental, contextual, or foreign influences directly or indirectly providing more behavioral responses. The ship image also helps in understanding that one can sail using the influences and counter influences of units of power, the same as the waves and winds (on sea) or attractions (in space), to steer one's own tack.
(source: ALFA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - Dasia

Greek term for rough breathing mark, used in polytonic Greek character names.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - DATA

The differences necessary for a process.
(source: ALFA, author: JFC Morfin, text listed to be reviewed and possibly changed).

[M] - Datacommunications

  • value-added service (ISO Layer 1 to 7): over network links on an end to end basis in using the data added in the packet header. Logical exchanges of dumb content (what you send is what you receive)
    (source: ALFA/OSEX, author: Jefsey Morfin, text listed to be reviewed and possibly changed).
  • edge services: contextual content (what you receive is what you were intended to receive) managed on an edge to edge basis in routing the calls through specialized Opened Pluggable Edge Services. They are supported by two additional layers: the interoperation layers and the "network edge" layer (layer 8 and 9). These two layers are also called the "plugged layers on the user side" (PLUS, and "+" on the server side).
    (source: ALFA/OSEX, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - DBCS

Acronym for double-byte character set.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - DCHK

A subset of IRIS for performing checks on whether a domain name is available to register. It is more lightweight, and has less privacy implications, than DREG as it does not transmit registration data other than simple availability.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Dead Consonant

An Indic consonant character followed by a virama character. This sequence indicates that the consonant has lost its inherent vowel. (See Section 9.1, Devanagari .)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Decimal Digits

Digits that can be used to form decimal-radix numbers.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Decomposable Character

A character that is equivalent to a sequence of one or more other characters, according to the decomposition mappings found in the Unicode Character Database, and those described in Section 3.12, Conjoining Jamo Behavior. It may also be known as a precomposed character or a composite character. (See definition D63 in Section 3.7. Decomposition.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Decomposition

(1) The process of separating or analyzing a text element into component units. These component units may not have any functional status, but may be simply formal units—that is, abstract shapes. (2) A sequence of one or more characters that is equivalent to a decomposable character. (See definition D64 in Section 3.7. Decomposition.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Decomposition Mapping

A mapping from a character to a sequence of one or more characters that is a canonical or compatibility equivalent and that is listed in the character names list or described in Section 3.12, Conjoining Jamo Behavior. (See definition D62 in Section 3.7. Decomposition.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Default Ignorable

[M] - Defective Combining Character Sequence

A combining character sequence that does not start with a base character. (See definition D57 in Section 3.6, Combination.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Delegation

In a DNS context, the act of entering parent-side NS (nameserver) records in a zone, thereby creating a subordinate namespace with its own SOA (start of authority) record. See RFC 1034 for detailed discussion of how the DNS name space is broken up into zones.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).
Any transfer of responsibility to another entity. In the domain name system, one name server can provide pointers to more useful name servers for a given request by returning NS records. On an administrative level, sub-domains are delegated to other entities. IANA also delegates IP address blocks to regional Internet registries.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Demotic Script

(1) A script or a form of a script used to write the vernacular or common speech of some language community. (2) A simplified form of the ancient Egyptian hieratic writing.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Dependent Vowel

A symbol or sign that represents a vowel and that is attached or combined with another symbol, usually one that represents a consonant. For example, in writing systems based on Arabic, Hebrew, and Indic scripts, vowels are normally represented as dependent vowel signs.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Deprecated

Of a coded character or a character property, strongly discouraged from use. (Not the same as obsolete.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Deprecated Character

A coded character whose use is strongly discouraged. Such characters are retained in the standard, but should not be used. (See definition D13 in Section 3.4, Characters and Encoding.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Designated Code Point

Any code point that has either been assigned to an abstract character (assigned characters) or that has otherwise been given a normative function by the standard (surrogate code points and noncharacters). This definition excludes reserved code points. Also known as assigned code point. (See Section 2.4 Code Points and Characters.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - diacritic

A mark applied or attached to a symbol to create a new symbol that represents a modified or new value. They can also be marks applied to a symbol irrespective of whether it changes the value of that symbol. In the latter case, the diacritic usually represents an independent value (for example, an accent, tone, or some other linguistic information). Also called diacritical mark or diacritical. <UNICODE> control character The 65 characters in the ranges U+0000..U+001F and U+007F..U+009F.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The basic space character, U+0020, is often considered as a control character as well, making the total number 66. They are also known as control codes or control characters. In terminology adopted by Unicode from ASCII and the ISO 8859 standards, these codes are treated as belonging to three ranges: "C0" (for U+0000..U+001F), "C1" (for U+0080...U+009F), and the single control character "DEL" (U+007F). <UNICODE> formatting character Characters that are inherently invisible but that have an effect on the surrounding characters. <UNICODE> Examples of formatting characters include characters for specifying the direction of text and characters that specify how to join multiple characters.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
(1) A mark applied or attached to a symbol to create a new symbol that represents a modified or new value. (2) A mark applied to a symbol irrespective of whether it changes the value of that symbol. In the latter case, the diacritic usually represents an independent value (for example, an accent, tone, or some other linguistic information). Also called diacritical mark or diacritical. (See also combining character and nonspacing mark.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Diaeresis

Two horizontal dots over a letter, as in naïve. The diaeresis is not distinguished from the umlaut in the Unicode character encoding. (See umlaut.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Dialytika

Greek term for diaeresis or trema, used in Greek character names.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - digit or number

All modern writing systems use decimal digits in some form; some older ones use non-positional or other systems. Different scripts may have their own digits. Unicode distinguishes between numbers and other kinds of characters by assigning a special General Category value to them and subdividing that value to distinguish between decimal digits, letter digits, and other digits.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Digits

(See Arabic digits, European digits, and Indic digits.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Digraph

A pair of signs or symbols (two graphs), which together represent a single sound or a single linguistic unit. The English writing system employs many digraphs (for example, th, ch, sh, qu, and so on). The same two symbols may not always be interpreted as a digraph (for example, cathode versus cathouse). When three signs are so combined, they are called a trigraph. More than three are usually called an n-graph.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Dingbats

Typographical symbols and ornaments.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Diphthong

A pair of vowels that are considered a single vowel for the purpose of phonemic distinction. One of the two vowels is more prominent than the other. In writing systems, diphthongs are sometimes written with one symbol and sometimes with more than one symbol (for example, with a digraph).
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Direction

(See paragraph direction.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Directionality Property

A property of every graphic character that determines its horizontal ordering as specified in Unicode Standard Annex #9, “Unicode Bidirectional Algorithm.” (See Section 4.4, Directionality—Normative.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Disallowed

defines which code points and character categories are treated as disallowed during preparation of the string.
(source: IDNA2008, author: Patrik Falström, text listed to be reviewed and possibly changed).

[M] - Display Cell

A rectangular region on a display device within which one or more glyphs are imaged.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Display Order

The order of glyphs presented in text rendering.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - displaying and rendering text

To display text, a system puts characters on a visual display device such as a screen or a printer. To render text, a system analyzes the character input to determine how to display the text.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The terms "display" and "render" are sometimes used interchangeably. Note, however, that text might be rendered as audio and/or tactile output, such as in systems that have been designed for people with visual disabilities. Combining characters modify the display of the character (or, in some cases, characters) that precede them. When rendering such text, the display engine must either find the glyph in the font that represents the base character and all of the combining characters, or it must render the combination itself. Such rendering can be straight-forward, but it is sometimes complicated when the combining marks interact with each other, such as when there are two combining marks that would appear above the same character. Formatting characters can also change the way that a renderer would display text. Rendering can also be difficult for some scripts that have complex display rules for base characters, such as Arabic and Indic scripts.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Diversity

Until the IDNA2008 RFC set, diversity was addressed in adding new features to the architecture. This turned out to be an architectural limitation addressed in an "unusual" manner [RFC 5895] through the outside application of the principle of subsidiarity constrained by the necessity to fit the Internet services framework documented by RFCs at the Internet Use Interface (IUI). That IUI can be transparently implemented as:
(source: IDNA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).
  • "PLUS" (Pluggable layers on the User side) - usually as a fringe added intelligence.
    (source: ALFA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).
  • "+" on the Host side (e.g. Google+) - usually as a fringe added organized system.
    (source: ALFA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - DNS zone

a section of the Domain Name System name space. By default, the Root Zone contains all domain names, however in practice sections of this are delegated into smaller zones in a hierarchical fashion. For example, the “.COM” zone would refer to the portion of the DNS delegated that ends in “.COM”.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - DNSSEC

A technology that can be added to the Domain Name System to verify the authenticity of its data. The works by adding verifiable chains of trust that can be validated to the domain name system.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - domain name label

a constituent part of a domain name. The labels of domain names are connected by dots. For example, “www.iana.org" contains three labels — “www”, “iana” and “org”. For internationalised domain names, the labels may be referred to as A-labels and U-labels.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - domain name registrar

An entity offering domain name registration services, as an agent between registrants and registries. Usually multiple registrars exist who compete with each other, and are accredited. For most generic top-level domains, domain name registrars are accredited by ICANN.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - domain name registry

A registry tasked with managing the contents of a DNS zone, by giving registrations of sub-domains to registrants.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - domain name server

A general term for a system on the Internet that answers requests to convert domain names into something else. These can be subdivided into authoritative name servers, which store the database for a particular DNS zone; as well as recursive name servers and caching name servers.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Domain Name Slot

A "domain name slot" is defined in this document to be a protocol element or a function argument or a return value (and so on) explicitly designated for carrying a domain name. Examples of domain name slots include the QNAME field of a DNS query; the name argument of the gethostbyname() or getaddrinfo() standard C library functions; the part of an email address following the at sign ("@") in the parameter to the SMTP MAIL or RCPT commands or the "From:" field of an email message header; and the host portion of the URI in the "src" attribute of an HTML "<IMG>" tag. A string that has the syntax of a domain name but that appears in general text is not in a domain name slot. For example, a domain name appearing in the plain text body of an email message is not occupying a domain name slot.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
An "IDNA-aware domain name slot" is defined for this set of documents to be a domain name slot explicitly designated for carrying an internationalized domain name as defined in this document. The designation may be static (for example, in the specification of the protocol or interface) or dynamic (for example, as a result of negotiation in an interactive session).
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Name slots that are not IDNA-aware obviously include any domain name slot whose specification predates IDNA. Note that the requirements of some protocols that use the DNS for data storage prevent the use of IDNs. For example, the format required for the underscore labels used by the service location protocol [RFC2782] precludes representation of a non-ASCII label in the DNS using A-labels because those SRV-related labels must start with underscores. Of course, non-ASCII IDN labels may be part of a domain name that also includes underscore labels.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Domain Name System Root

see Root Zone.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - dot [string]

common way of referring to a specific top-level domain. For example “dot info” refers to the “INFO” top-level domain. Written in text as “.INFO”.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Double-Byte Character

[M] - Double-Byte Character Set

One of a number of character sets defined for representing Chinese, Japanese, or Korean text (for example, JIS X 0208-1990). These character sets are often encoded in such a way as to allow double-byte character encodings to be mixed with single-byte character encodings. Abbreviated DBCS. (See also multibyte character set.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - DREG

A subset of IRIS for performing registration lookups on domain names.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Ductility

The ability of a cursive font to stretch or compress the connective baseline to effect text justification.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Dynamic Composition

[M] - E.164

see ENUM.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - eading Consonant

(1) In Korean, a jamo character with the Hangul_Syllable_Type property value Leading_Jamo (in the range U+1100..U+1159 or U+115F hangul choseong filler). Abbreviated as L. (See definition D122 in Section 3.12, Conjoining Jamo Behavior.) (2) Any initial consonant in a syllable.
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ECCS

[M] - EGC

[M] - eIANA

see RZM Automation.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Embedding

[M] - Encapsulated Text

[M] - Enclosing Mark

[M] - Encoded Character

(See character encoding form.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Encoding Form

[M] - Encoding Scheme

(See character encoding scheme.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Encodings

In 1996 the IAB sponsored a workshop on character sets and encodings [RFC 2130]. This document adds to that discussion and focuses on the importance of agreeing on a single encoding and how complicated the state of affairs ends up being as a result of using different encodings today.
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).
Different applications, APIs, and protocols use different encoding schemes today. Many of them were originally defined to use only ASCII. Internationalizing Domain Names in Applications (IDNA) [RFC 5890] defines a mechanism that requires changes to applications, but in an attempt not to change APIs or servers, specifies that the A-label format is to be used in many contexts. In some ways this could be seen as not changing the existing APIs, in the sense that the strings being passed to and from the APIs are still apparently ASCII strings. In other ways it is a very profound change to the existing APIs, because while those strings are still syntactically valid ASCII strings, they no longer mean the same thing that they used to. What looks like a plain ASCII string to one piece of software or library could be seen by another piece of software or library (with the application of out-of-band information) to be in fact an encoding of a Unicode string.
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).

[M] - ENUM

A system of mapping telephone numbers (formally known as E.164 numbers after the telephone numbering standard) to Internet resources.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - EPP

see Extensible Provisioning Protocol.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Equivalence

In the context of text processing, the process or result of establishing whether two text elements are identical in some respect.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Equivalent Sequence

(See canonical equivalent.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Escape Sequence

A sequence of bytes that is used for code extension. The first byte in the sequence is escape (hex 1B).
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - EUDC

Acronym for end-user defined character. A character defined by an end user, using a private-use code point, to represent a character missing in a particular character encoding. These are common in East Asian implementations.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - European Digits

Forms of decimal digits first used in Europe and now used worldwide. Historically, these digits were derived from the Arabic digits; they are sometimes called “Arabic numerals,” but this nomenclature leads to confusion with the real Arabic digits.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Evolution

  • The [ICANN/ICP-3 document states in 2001:"In an ever-evolving Internet, ultimately there may be better architectures for getting the job done where the need for a single, authoritative root will not be an issue. But that is not the case today. And the transition to such an architecture, should it emerge, would require community-based approaches. In the interim, responsible experimentation should be encouraged, but it should not be done in a manner that affects those who do not consent after being informed of the character of the experiment.'" It was not the case in 2001, it is the case a decade later.
    (source: ICANN, author: unknown, text listed to be reviewed and possibly changed).

[M] - Extended Base

Any base character, or any standard Korean syllable block. (See definition D51a in Unicode 5.1.0.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Extended Combining Character Sequence

A maximal character sequence consisting of either an extended base followed by a sequence of one or more characters where each is a combining character, zero width joiner, or zero width non-joiner; or a sequence of one or more characters where each is a combining character, zero width joiner, or zero width non-joiner. Abbreviated as ECCS. (See definition D56a in Unicode 5.1.0.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Extended Grapheme Cluster

The text between extended grapheme cluster boundaries as specified by Unicode Standard Annex #29, \\\"\\\"Unicode Text Segmentation.\\\"\\\" Abbreviated as EGC. (See definition D61 in Unicode 5.1.0.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Extensible Markup Language

see XML.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Extensible Provisioning Protocol (EPP)

A protocol used for electronic communication between a registrar and a registry for provisioning domain names.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Fancy Text

(See rich text.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - first come, first served (FCFS)

The principle of allocation of most Internet resources. It means that that assuming you meet any relevant qualifying criteria (such as meeting policy requirements, including possibly demonstrating need, and paying any relevant fees), you are allowed to register a given resource if you are the first one to lay claim to it. Most IANA registries are administered on a “first come, first served” basis.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Fixed Position Class

A subset of the range of numeric values for combining classes—specifically, any value in the range 10..199. (See definition D105 in Section 3.11, Canonical Ordering Behavior.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Floating (diacritic, accent, mark)

(See nonspacing mark.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Folding

An operation that maps similar characters to a common target, such as uppercasing or lowercasing a string. Folding operations are most often used to temporarily ignore certain distinctions between characters.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Font

A collection of glyphs used for the visual depiction of character data. A font is often associated with a set of parameters (for example, size, posture, weight, and serifness), which, when set to particular values, generate a collection of imagable glyphs.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Format Character

A character that is inherently invisible but that has an effect on the surrounding characters.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Format Code

Synonym for format character.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Formatted Text

FQDN: A fully qualified domain name, one that explicitly contains all labels, including a Top-Level Domain (TLD) name. In this context, a TLD name is one whose label appears in a nameserver record in the root zone. The term "Domain Name Label" refers to any label of a FQDN.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - FreeClass

a sequence of letters, numbers, symbols, spaces, and other code points that is used for more expressive purposes in an application protocol (e.g., a free-form identifier such as a human-friendly nickname in a chatroom).
(source: draft-blanchet-precis-framework-03, author: M. Blanchet, P. Saint-Andre, text listed to be reviewed and possibly changed).

[M] - FSS-UTF

Acronym for File System Safe UCS Transformation Format, published by the X/Open Company Ltd., and intended for the UNIX environment. Now known as UTF-8.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Full Composition Exclusion

A Canonical Decomposable Character which has the property value Full_Composition_Exclusion=True. (Used in the definition of Unicode Normalization Forms.) (See definition D113 in Section 3.11, Normalization Forms.)
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Fullwidth

Characters of East Asian character sets whose glyph image extends across the entire character display cell. In legacy character sets, fullwidth characters are normally encoded in two or three bytes. The Japanese term for fullwidth characters is zenkaku.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - fully-qualified domain name (FQDN)

A complete domain name including all its components, i.e. “www.icann.org" as opposed to “www”.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Fundamental Label

A U-label that consists only of Valid Code Points. In practice, this is the U-label requested to be registered.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).

[M] - Fundamental TLD

The Fundamental Label form of a Variant TLD Set.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).

[M] - GAC Principles

A document, formally known as the Principles for the Delegation and Administration of ccTLDs. This document was developed by the ICANN Governmental Advisory Committee and documents a set of principles agreed by governments on how ccTLDs should be delegated and run. It is one of a number of documents considered when ICANN evaluates a ccTLD delegation request.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - GC

1. Acronym for grapheme cluster. 2. Short name for the General_Category property, usually lowercased- gc.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - GCGID

Acronym for Graphic Character Global Identifier. These are listed in the IBM document Character Data Representation Architecture, Level 1, Registry SC09-1391.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - General Category

Partition of the characters into major classes such as letters, punctuation, and symbols, and further subclasses for each of the major classes. (See Section 4.5, General Category—Normative.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Generative

Synonym for productive.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - generic top-level domains (gTLDs)

A class of top-level domains that are used for general purposes, where ICANN has a strong role in coordination (as opposed to country-code top-level domains, which are managed locally). For policy reasons, these are usually subdivided into sponsored top-level domains and unsponsored top-level domains.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Globalization (proposed by Unicode)

String from a single language are used in protocols together with strings of other languages through:
(source: UNICODE, author: Jefsey Morfin, text listed to be reviewed and possibly changed).
  • internationlization of the media: the characters used by the scripts of these other languages are supported by the medium.
    (source: UNICODE, author: Jefsey Morfin, text listed to be reviewed and possibly changed).
  • localization: adaptation of the protocols to the locale language and othotypography.
    (source: UNICODE, author: Jefsey Morfin, text listed to be reviewed and possibly changed).
  • taggization: indication of the language being used. It can be used for filtering.
    (source: UNICODE, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - glue record

An explicit notation of the IP address of a name server, placed in a zone outside of the zone that would ordinarily contain that information. This is required because in some circumstances it would be impossible to find the name server otherwise, such as when the name server is in-bailiwick. All name servers are in-bailiwick of the Root Zone, therefore glue records is required for all name servers listed there. Also referred to as just “glue”.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - glyph

A glyph is an abstract form that represents one or more glyph images. The term "glyph" is often a synonym for glyph image, which is the actual, concrete image of a glyph representation having been rasterized or otherwise imaged onto some display surface. In displaying character data, one or more glyphs may be selected to depict a particular character. These glyphs are selected by a rendering engine during composition and layout processing.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
(1) An abstract form that represents one or more glyph images. (2) A synonym for glyph image. In displaying Unicode character data, one or more glyphs may be selected to depict a particular character. These glyphs are selected by a rendering engine during composition and layout processing. (See also character.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Glyph Code

A numeric code that refers to a glyph. Usually, the glyphs contained in a font are referenced by their glyph code. Glyph codes may be local to a particular font; that is, a different font containing the same glyphs may use different codes.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - glyph code

A glyph code is a numeric code that refers to a glyph. Usually, the glyphs contained in a font are referenced by their glyph code.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Glyph codes are local to a particular font; that is, a different font containing the same glyphs may use different codes. transcoding Transcoding is the process of converting text data from one character encoding form to another. Transcoders work only at the level of character encoding and do not parse the text. Note:
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Transcoding may involve one-to-one, many-to-one, one-to-many or many-to-many mappings. Because some legacy mappings are glyphic, they may not only be many-to-many, but also unordered: thus XYZ may map to yxz. [CHARMOD] In this definition, "many-to-one" means a sequence of characters mapped to a single character. The "many" does not mean alternative characters that map to the single character.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Glyph Identifier

Similar to a glyph code, a glyph identifier is a label used to refer to a glyph within a font. A font may employ both local and global glyph identifiers.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Glyph Image

The actual, concrete image of a glyph representation having been rasterized or otherwise imaged onto some display surface.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Glyph Metrics

A collection of properties that specify the relative size and positioning along with other features of a glyph.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - graphcode

Table of graphic symbols used as script characters, independently from the language being supported. It could serve as an algorithm for variants and anti-phishing.
(source: ALFA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - Grapheme

(1) A minimally distinctive unit of writing in the context of a particular writing system. For example, ‹b› and ‹d› are distinct graphemes in English writing systems because there exist distinct words like big and dig. Conversely, a lowercase italiform letter a and a lowercase Roman letter a are not distinct graphemes because no word is distinguished on the basis of these two different forms. (2) What a user thinks of as a character.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Grapheme Base

A character with the property Grapheme_Base, or any standard Korean syllable block. (See definition D58 in Section 3.6, Combination.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Grapheme Cluster

The text between grapheme cluster boundaries as specified by Unicode Standard Annex #29, \\\"\\\"Unicode Text Segmentation.\\\"\\\" (See definition D60 in Unicode 5.1.0.) A grapheme cluster represents a horizontally segmentable unit of text, consisting of some grapheme base (which may consist of a Korean syllable) together with any number of nonspacing marks applied to it.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Grapheme Extender

A character with the property Grapheme_Extend. (See definition D59 in Section 3.6, Combination.) Grapheme extender characters consist of all nonspacing marks, zero width joiner, zero width non-joiner, and a small number of spacing marks.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Graphic Character

A character with the General Category of Letter (L), Combining Mark (M), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs). (See definition D50 in Section 3.6. Combination.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - graphic symbol

A graphic symbol is the visual representation of a graphic character or of a composite sequence. <ISOIEC10646> font A font is a collection of glyphs used for the visual depiction of character data. A font is often associated with a set of parameters (for example, size, posture, weight, and serifness), which, when set to particular values, generate a collection of imagable glyphs. <UNICODE> The term "font" is often used interchangeably with "typeface". As historically used in typography, a typeface is a family of one or more fonts that share a common general design. For example, "Times Roman" is actually a typeface, with a collection of fonts such as "Times Roman Bold", "Times Roman Medium", "Times Roman Italic", and so on. Some sources even consider different type sizes within a typeface to be different fonts. While those distinctions are rarely important for internationalization purposes, there are exceptions. Those writing specifications should be very careful about definitions in cases in which the exceptions might lead to ambiguity.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Guillemet

Punctuation marks resembling small less-than and greater-than signs, used as quotation marks in French and other languages. (See “Language-Based Usage of Quotation Marks” in Section 6.2, General Punctuation.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Halant

A preferred Hindi synonym for a virama. It literally means killer, referring to its function of killing the inherent vowel of a consonant letter. (See virama.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Half-Consonant Form

In the Devanagari script and certain other scripts of the Brahmi family of Indic scripts, a dead consonant may be depicted in the so-called half-form. This form is composed of the distinctive part of a consonant letter symbol without its vertical stem. It may be used to create conjunct forms that follow a horizontal layout pattern. Also known as half-form.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Halfwidth

Characters of East Asian character sets whose glyph image occupies half of the character display cell. In legacy character sets, halfwidth characters are normally encoded in a single byte. The Japanese term for halfwidth characters is hankaku.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Han Characters

Ideographic characters of Chinese origin. (See Section 12.1, Han.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Han Unification

The process of identifying Han characters that are in common among the writing systems of Chinese, Japanese, Korean, and Vietnamese.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Hangul

The name of the script used to write the Korean language.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Hangul Syllable

(1) Any of the 11,172 encoded characters of the Hangul Syllables character block, U/AC00..U/D7A3. Also called a precomposed Hangul syllable to clearly distinguish it from a Korean syllable block. (2) Loosely speaking, a Korean syllable block.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Hanja

The Korean name for Han characters; derived from the Chinese word hànzì.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Hankaku

(See halfwidth.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Hànzì

The Mandarin Chinese name for Han characters.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Harakat

Marks that indicate vowels or other modifications of consonant letters in Arabic script.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Hasant

The Bangla name for halant. (See virama.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Higher-Level Protocol

Any agreement on the interpretation of Unicode characters that extends beyond the scope of this standard. Note that such an agreement need not be formally announced in data; it may be implicit in the context. (See definition D16 in Section 3.4, Characters and Encoding.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - High-Surrogate Code Point

A Unicode code point in the range U/D800 to U/DBFF. (See definition D71 in Section 3.8, Surrogates.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - High-Surrogate Code Unit

A 16-bit code unit in the range D80016 to DBFF16, used in UTF-16 as the leading code unit of a surrogate pair. Also known as a leading surrogate. (See definition D72 in Section 3.8, Surrogates.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - hints file

A file stored in DNS software (i.e. recursive name servers) that tells it where the DNS root servers are located. Because the DNS is used to self-discover where its servers are located, this file is used to boot-strap the process when the DNS software knows nothing.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Hiragana

One of two standard syllabaries associated with the Japanese writing system. Hiragana syllables are typically used in the representation of native Japanese words and grammatical particles.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - i18n, l10n

These are abbreviations for "internationalization" and "localization". "18" is the number of characters between the "i" and the "n" in "internationalization", and "10" is the number of characters between the "l" and the "n" in "localization".
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - IANA Considerations

A component of RFCs that refer to any work required by IANA to maintain registries for a specific protocol.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - IANA Contract

The contract between ICANN and the US Government that governs how the IANA functions are performed.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - IANA Staff

[M] - ICP-1

A document written by IANA staff in 1999 describing how they manage top-level domains. Compare RFC 1591.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ICP-2

A document describing how new regional Internet registries may be created.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ICP-3

A document describing the requirement for a unique, authoritative DNS root zone. See also RFC 2826.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ICU

Acronym for International Components for Unicode, an Open Source set of C/C// and Java libraries for Unicode and software internationalization support. For information, see http-//www.icu-project.org/
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Ideograph

(1) Any symbol that primarily denotes an idea (or meaning) in contrast to a sound (or pronunciation)—for example, a symbol showing a telephone. (2) An English term commonly used to refer to Han characters, equivalent to the borrowings hànzì, kanji, and hanja.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ideographic

Any symbol that primarily denotes an idea (or meaning) in contrast to a sound (or pronunciation), for example, a symbol showing a telephone or the Han characters used in Chinese, Japanese, and Korean. <UNICODE> While Unicode and many other systems use this term to refer to all Han characters, strictly speaking not all of those characters are actually ideographic. Some are pictographic (such as the telephone example above), some are used phonetically, and so on.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
However, the convention is to describe the script as ideographic as contrasted to alphabetic.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Ideographic Property

Informative property of characters that are ideographs. (See Section 4.10, Letters, Alphabetic, and Ideographic.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - IDL

IDL: This document provides a guideline to be applied on a per-zone basis, one label at a time. Therefore, the term "Internationalized Domain Label" or "IDL" will be used instead of the more general term "IDN" or its equivalents. The processing specifications of this document may be applied, in some zones, to ASCII characters also, if those characters are specified as valid in a Language Variant Table (see below). Hence, in some zones, an IDL may contain or consist entirely of "LDH" characters.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - IDL Package

IDL Package: A collection of IDLs as determined by these Guidelines.
All labels in the package are "reserved", meaning they cannot be registered by anyone other than the holder of the Package. These reserved IDLs may be "activated", meaning they are actually entered into a zone file as a "Zone Variant". The IDL Package also contains identification of the language(s) associated with the registration process. The IDL and its variant labels form a single, atomic unit.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - IDN

[M] - IDN

An IDN is a domain name that contains one or more labels that, in turn, contain one or more non-ASCII characters. Just as with plain ASCII domain names, each IDN label must be encoded using some mechanism before it can be transmitted in network packets, stored in memory, stored on disk, etc. These encodings need to be reversible, but they need not store domain names the same way humans conventionally write them on paper. For example, when transmitted over the network in DNS packets, domain name labels are *not* separated with dots.
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).
See Internationalised Domain Name.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).
IDN: The term "IDN" has a number of different uses: (a) as an abbreviation for "Internationalized Domain Name"; (b) as a fully qualified domain name that contains at least one label that contains characters not appearing in ASCII, specifically not in the subset of ASCII recommended for domain names (the so-called "hostname" or "LDH" subset, see RFC1035 [STD13]); (c) as a label of a domain name that contains at least one character beyond ASCII; (d) as a Unicode string to be processed by Nameprep; (e) as a string that is an output from Nameprep; (f) as a string that is the result of processing through both Nameprep and conversion into Punycode; (g) as the abbreviation of an IDN (more properly, IDL) Package, in the terminology of this document; (h) as the abbreviation of the IETF IDN Working Group; (g) as the abbreviation of the ICANN IDN Committee; and (h) as standing for other IDN activities in other companies/organizations.
Because of the potential confusion, this document uses the term "IDN" as an abbreviation for Internationalized Domain Name and, specifically, in the second sense described in (b) above. It uses "IDL," defined immediately below, to refer to Internationalized Domain Labels.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - IDN Practices Repository

A repository on IANA’s website where top-level domain registries contribute the IDN tables they use. This allows other registries to re-use the tables if they wish.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - IDN Table

A list of permissible Unicode code points allowed for registration in domain names by a registry. Usually, these are applied on a language or script basis.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - IDNA

Internationalized Domain Names for Applications (IDNA) is the standard that defines the use and coding of internationalized domain names for use on the public Internet [RFC 5890]. An earlier version of IDNA [RFC 3490] is now being phased out. Except where noted, the two versions are approximately the same with regard to the issues discussed in this document. However, some explanations appeared in the earlier documents that were no longer considered useful when the later revision was created; they are quoted here from the documents in which they appear. In addition, the terminology of the two versions differ somewhat; this document reflects the terminology of the current version.
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).

[M] - IDNA Symmetry Constraint

A-label/U-label transformation must be symmetric: an A-label A1 must be capable of being produced by conversion from a U-label U1, and that U-label U1 must be capable of being produced by conversion from A-label A1. (RFC 5890)
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).

[M] - IDNA vs IDNApplication

The main question raised by IDNA is: what guarantees the users that the IDNAPIs in the different applications work the same and lead to the same resolution? This is addressed in removing the API from applications that are in this way kept neutral to the User Domain Name encoding and to run a single IDNApplication providing a DNS Front-End common IDN Service. Such an IDNS Front-End can support a full multi-layer IDN-Pile of polynyms in the different formats being used on the local or global network.
(source: ALFA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).
(fig. 1) Conceptual model of the IDNS.
(source: ALFA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - IDNA-valid string

A string is "IDNA-valid" if it meets all of the requirements of these specifications for an IDNA label. IDNA-valid strings may appear in either of the two forms defined immediately below, or may be drawn from the NR-LDH label subset. IDNA-valid strings must also conform to all basic DNS requirements for labels. These documents make specific reference to the form appropriate to any context in which the distinction is important.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - IDNS

IDNA2008 has split the management of the naming space between the core of the network, its periphery (the Internet Use Interface - IUI) and the external users (as exemplified in RFC 5895). This has introduced a subsidiarity based system which is an integrated, intelligent, international extension of the Internet DNS that can be also used outside of the internet, with other technologies as well. It is proposed that the internet part of the whole subsidiary domain name system (SDNS) is named "IDNS". Following this logic
(source: ALFA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).
  • "ADN" means ASCII specific (legacy) domain name,
    (source: ALFA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).
  • "IDN" means regular Internet domain name as supported by IDNA2008,
    (source: ALFA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).
  • and "UDN" mean UTF-8/16/32 User domain name.
    (source: ALFA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).
The International IDNA2008 oriented "International Domain Names" (IDNs) resolution system along the Internet community operational rules.
(source: IUTF, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - IICore

A subset of common-use CJK unified ideographs, defined as the fixed collection 370 IICore in ISO/IEC 10646. This subset contains 9,810 ideographs and is intended for common use in East Asian contexts, particularly for small devices that cannot support the full range of CJK unified ideographs encoded in the Unicode Standard.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Ill-Formed Code Unit Sequence

A code unit sequence that does not follow the specification of a Unicode encoding form. (See definition D84 in Section 3.9, Unicode Encoding Forms.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Ill-Formed Code Unit Subsequence

A non-empty subsequence of a Unicode code unit sequence X which does not contain any code units which also belong to any minimal well-formed subsequence of X. (See definition D84a in Unicode 5.1.0.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - in-bailiwick

when a domain name is a sub-domain of another, used for identifying whether a glue record is required. For example, “iana.org” is in the bailiwick of “org”. All domains are considered in-bailiwick of the DNS Root Zone.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - In-Band

An in-band channel conveys information about text by embedding that information within the text itself, with special syntax to distinguish it. In-band information is encoded in the same character set as the text, and is interspersed with and carried along with the text data. Examples are XML and HTML markup.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Indefinite Growth

[RFC 3439] documents that the unifying principle is best expressed by the Simplicity Principle, which states complexity must be controlled if one hopes to efficiently scale a complex object.
(source: RFC 3439, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - Independent Vowel

In Indic scripts, certain vowels are depicted using independent letter symbols that stand on their own. This is often true when a word starts with a vowel or a word consists of only a vowel.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Indic Digits

Forms of decimal digits used in various Indic scripts (for example, Devanagari- U/0966, U/0967, U/0968, U/0969). Arabic digits (and, eventually, European digits) derive historically from these forms.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - INFORMATION

The differences that make a difference (Gregory Bateson).
(source: ALFA, author: JFC Morfin, text listed to be reviewed and possibly changed).

[M] - Informative

Information in this standard that is not normative but that contributes to the correct use and implementation of the standard.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - infrastructure domain, infrastructure top-level domain

A term sometime used for “.ARPA” and its sub-domains, as it does not fit into the other categorisations of top-level domains.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Inherent Vowel

In writing systems based on a script in the Brahmi family of Indic scripts, a consonant letter symbol normally has an inherent vowel, unless otherwise indicated. The phonetic value of this vowel differs among the various languages written with these writing systems. An inherent vowel is overridden either by indicating another vowel with an explicit vowel sign or by using virama to create a dead consonant.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Inner Caps

Mixed case format where an uppercase letter is in a position other than first in the word—for example, “G” in the Name “McGowan.”
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - input methods

An input method is a mechanism for a person to enter text into an application. <NONE> Text can be entered into a computer in many ways. Keyboards are by far the most common device used, but many characters cannot be entered on typical computer keyboards in a single stroke. Many operating systems come with system software that lets users input characters outside the range of what is allowed by keyboards.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
For example, there are dozens of different input methods for Han characters in Chinese, Japanese, and Korean. Some start with phonetic input through the keyboard, while others use the number of strokes in the character. Input methods are also needed for scripts that have many diacritics, such as European or Vietnamese characters that have two or three diacritics on a single alphabetic character.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The term "input method editor" (IME) is often used generically to describe the tools and software used to deal with input of characters on a particular system.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Interim Trust Anchor Repository (ITAR)

A proposed IANA service whereby the trust anchors for top-level domains can be listed separately from the DNS root zone. This is a temporary measure due to the inability to use DNSSEC to sign the root zone.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - internationalised domain name (IDN)

A domain name that uses characters outside the 37 characters allowed by the “LDH rule”, using a system known as IDNA. This allows for domain names in non-Latin scripts, such as Arabic, Japanese or Cyrillic.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Internationalised Domain Names in Applications (IDNA)

The Internet standard defining the encoding of internationalised domain names. The “in Applications” is in reference to the way the standard works, as the conversion happens in application software rather than in the network, and therefore does not affect the wire format of the DNS. The domains are internally coded in a special representation using the prefix “xn--”, known as an A-label. Described in Internet Standard RFC 3490.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - internationalization

In the IETF, "internationalization" means to add or improve the handling of non-ASCII text in a protocol. Many protocols that handle text only handle one charset (US- ASCII), or leave the question of what CCS and encoding are used up to local guesswork (which leads, of course, to interoperability problems). If multiple charsets are permitted they must be explicitly identified [RFC2277]. Adding non-ASCII text to a protocol allows the protocol to handle more scripts, hopefully all of the ones useful in the world. In today's world, that is normally best accomplished by allowing Unicode encoded in UTF-8 only, thereby shifting conversion issues away from individual choices.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Internationalized Domain Name

An "internationalized domain name" (IDN) is a domain name that contains at least one A-label or U-label, but that otherwise may contain any mixture of NR-LDH labels, A-labels, or U-labels. Just as has been the case with ASCII names, some DNS zone administrators may impose restrictions, beyond those imposed by DNS or IDNA, on the characters or strings that may be registered as labels in their zones. Because of the diversity of characters that can be used in a U-label and the confusion they might cause, such restrictions are mandatory for IDN registries and zones even though the particular restrictions are not part of these specifications (the issue is discussed in more detail in Section 4.3 of the Protocol document [RFC5891]. Because these restrictions, commonly known as "registry restrictions", only affect what can be registered and not lookup processing, they have no effect on the syntax or semantics of DNS protocol messages; a query for a name that matches no records will yield the same response regardless of the reason why it is not in the zone. Clients issuing queries or interpreting responses cannot be assumed to have any knowledge of zone-specific restrictions or conventions. See the section on registration policy in the Rationale document [RFC5894] for additional discussion.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Internationalized Label

"Internationalized label" is used when a term is needed to refer to a single label of an IDN, i.e., one that might be any of an NR-LDH label, A-label, or U-label. There are some standardized DNS label formats, such as the "underscore labels" used for service location (SRV) records [RFC2782], that do not fall into any of the three categories and hence are not internationalized labels.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Internet Coordination Policy (ICP)

A series of documents created by ICANN between 1999 and 2000 describing management procedures. Three such documents were published before the numbering system stopped being used. Subsequent ICANN publications have not been given ICP numbers.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Internet Protocol address

see IP Address.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Internet Registry Information Service (IRIS)

A sophisticated protocol for looking up registration data. It is designed to supplant the WHOIS protocol, by offering many technological improvements such as internationalisation, access control, automatic server discovery and structured formatting; however to date has not been adopted in any significant way. Documented in technical standard RFC 3981 and others.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Internet Telephony Administrative Domain (ITAD)

A unique numbering system used by Telephone Routing over Internet Protocol (TRIP) to label phone services within an organisation. A company may apply for an ITAD number to use in numbering systems without conflicting with other companies and users. See RFC 3219.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - IP address block

A range of IP addresses that is assigned in a contiguous block. Usually the size of the range is described as the number of binary “bits” masked by the allocation. For example a “slash 24” or “/24” refers to a block of 256 IP addresses in IPv4.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - IP address Space

The entire range of conceivable IP addresses. Managed by IANA, and generally delegated in blocks to Regional Internet Registries.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - IPA

(1) The International Phonetic Alphabet. (2) The International Phonetic Association, which defines and maintains the International Phonetic Alphabet.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - IPv4

Internet Protocol version 4. Refers to the version of Internet protocol that supports 32-bit IP addresses. This allows for approximately 4 billion unique IP addresses, which is not enough to cope with projected Internet demand in the next 5-10 years. Therefore, a new protocol called IPv6 has been developed that increases the number of possible IP addresses substantially.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - IPv6

Internet Protocol version 6. Refers to the version of Internet protocol that supports 128-bit IP addresses. This protocol is not yet widely deployed, but allows for orders-of-magnitude more IP addresses than the more common IPv4 protocol.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - IRG

Acronym for Ideographic Rapporteur Group, a subgroup of ISO/IEC JTC1/SC2/WG2. (See Appendix E, Han Unification History.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - IRIS

See Internet Registry Information Service
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - IS NOT

This phrase, or the phrase "ARE NOT", means that the definition is an absolute fact beyond the specification reach.
(source: IUCG, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - ISCII

[M] - ISCII Acronym for Indian Script Code for Information Interchange.


(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ISO 3166

A suite of international standards for labelling countries, territories, sub-national entities and former countries. Most notable, Part 1 of ISO 3166 (aka ISO 3166-1) is used by IANA to determine country-codes for top-level domains.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ISO 3166 Maintenance Agency (ISO 3166/MA)

The agency of ISO tasked with maintaining the ISO 3166 standard. It is responsible for any updates, for example, when a country is created or ceases to exist. ICANN is one of the ten members of the ISO 3166/MA.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ISO 3166-1

A part of the ISO 3166 suite of standards describing two and three letters codes that represent countries. The two letter codes in ISO 3166-1 are used to determine the domains used for country-code top-level domains.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ISO and ISO/IEC JTC 1

The International Organization for Standardization has been involved with standards for characters since before the IETF was started. ISO is a non-governmental group made up of national bodies. Most of ISO's work in information technology is performed jointly with a similar body, the International Electrotechnical Commission (IEC) through a joint committee known as "JTC 1". ISO and ISO/IEC JTC 1 have many diverse standards in the international characters area; the one that is most used in the IETF is commonly referred to as "ISO/IEC 10646", sometimes with a specific date.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
ISO/IEC 10646 describes a CCS that covers almost all known written characters in use today.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
ISO/IEC 10646 is controlled by the group known as "ISO/IEC JTC 1/SC 2 WG2", often called "SC2/WG2" or "WG2" for short. ISO standards go through many steps before being finished, and years often go by between changes to the base ISO/IEC 10646 standard although amendments are now issued to track Unicode changes.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Information on WG2, and its work products, can be found at <http://www.dkuug.dk/JTC1/SC2/WG2/>. Information on SC2, and its work products, can be found at <http://www.iso.org/iso/ standards_development/technical_committees/ list_of_iso_technical_committees/ iso_technical_committee.htm?commid=45050> The standard comes as a base part and a series of attachments or amendments. It is available in PDF form for downloading or in a CD-ROM version. One example of how to cite the standard is given in [RFC3629]. Any standard that cites ISO/IEC 10646 needs to evaluate how to handle the versioning problem that is relevant to the protocol's needs.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
ISO is responsible for other standards that might be of interest to protocol developers concerned about internationalization. ISO 639 [ISO639] specifies the names of languages and forms part of the basis for the IETF's Language Tag work [RFC5646]. ISO 3166 [ISO3166] specifies the names and code abbreviations for countries and territories and is used in several protocols and databases including names for country-code top level domain names. The responsibilities of ISO TC 46 on Information and Documentation <http://www.iso.org/iso/standards_development/ technical_committees/list_of_iso_technical_committees/ iso_technical_committee.htm?commid=48750> include a series of standards for transliteration of various languages into Latin characters.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Another relevant ISO group was JTC 1/SC22/WG20, which was responsible for internationalization in JTC1, such as for international string ordering. Information on WG20, and its work products, can be found at <http://www.dkuug.dk/jtc1/sc22/wg20/>.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The specific tasks of SC22/WG20 were moved from SC22 into SC2 and there has been little significant activity since that occurred.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - ISO/IEC 10646

ISO/IEC 10646: The international standard universal multiple-octet coded character set ("UCS") [IS10646]. The Code Point definitions of this standard are identical to those of corresponding versions of the Unicode standard (see below). Consequently, the characters and their coding are often referred to as "Unicode characters."
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - ISO-2022-JP

ISO-2022-JP [RFC1468] is a mechanism for encoding a string of ASCII and Japanese characters, where an ASCII character is preserved as-is.
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).
ISO-2022-JP is stateful: special sequences are used to switch between character coding tables. As a result, if there are lost or mangled characters in a character stream, it is extremely difficult to recover the original stream after such a lost character encoding shift.
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).

[M] - ITAD

See Internet Telephony Administrative Domain.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - IUDNS

The Intelligent Use domain name system that permits one to manage and resolve User Domain Names (IUDNs) in any user chosen and documented format.
(source: IUTF, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - Jamo

The Korean name for a single letter of the Hangul script. Jamos are used to form Hangul syllables.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Joiner

An invisible character that affects the joining behavior of surrounding characters. (See Section 8.2, Arabic, and “Cursive Connection” in Section 16.2, Layout Controls.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Jon Postel

see Postel, Jon.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Jongseong

A sequence of one or more trailing consonants in Korean.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - JTC1

The Joint Technical Committee 1 of the International Organization for Standardization and the International Electrotechnical Commission responsible for information technology standardization.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Jungseong

A sequence of one or more vowels in Korean
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Kana

The name of a primarily syllabic script used by the Japanese writing system. It comes in two forms, hiragana and katakana. The former is used to write particles, grammatical affixes, and words that have no kanji form; the latter is used primarily to write foreign words.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Kanji

The Japanese name for Han characters; derived from the Chinese word hànzì. Also romanized as kanzi.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Katakana

One of two standard syllabaries associated with the Japanese writing system. Katakana syllables are typically used in representation of borrowed vocabulary (other than that of Chinese origin), sound-symbolic interjections, or phonetic representation of “difficult” kanji characters in Japanese.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Kerning

(1) Changing the space between certain pairs of letters to improve the appearance of the text. (2) The process of mapping from pairs of glyphs to a positioning offset used to change the space between letters.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Korean Syllable Block

A sequence of Korean jamos, consisting of one or more leading consonants followed by one or more vowels followed by zero or more trailing consonants, or any canonically equivalent sequence including a precomposed Hangul syllable. In regular expression notation- L L* V V* T*. Also called a standard Korean syllable block. (See Section 3.12, Conjoining Jamo Behavior.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Label String

Label String: A generic term referring to a string of characters that is a candidate for registration in the DNS or such a string, once registered. A label string may or may not be valid according to the rules of this specification and may even be invalid for IDNA use.
The term "label", by itself, refers to a string that has been validated and may be formatted to appear in a DNS zone file.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - language

A language is a way that humans interact. The use of language occurs in many forms, the most common of which are speech, writing, and signing. Some languages have a close relationship between the written and spoken forms, while others have a looser relationship. The so- called LTRU (Language Tag Registry Update) standards [RFC5646] [RFC4647] discuss languages in more detail and provides identifiers for languages for use in Internet protocols. Note that computer languages are explicitly excluded from this definition.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Language Character Repertoire

A set of Code Points identified by some identifier (such as a tag for identifying language as defined in RFC 5646). The definition of the Language Character Repertoire is ideally performed in a way appropriate to some community of language users, and might colloquially be understood as “the characters used to write a language”. In most cases, all the Code Points in a Language Character Repertoire will come from the same Script Table.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).

[M] - language identification

Specification of the human language used for a string of text.
Some protocols (such as MIME and HTTP) allow text that is meant for machine processing to be identified with the language used in the text. Such identification is important for machine processing of the text, such as by systems that render the text by speaking it. Language identification is also called "language tagging".
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The IETF "LTRU" standards [RFC5646] and [RFC4647] provide a comprehensive model for language identification.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - language table

see IDN table.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Language Variant Table

A three-column table for each Language Character Repertoire permitted to be registered in a zone. The columns are known, respectively, as "Valid Code Point", "Preferred Variant", and "Character Variant", which are defined separately. (This definition differs from RFC 3743 in the subsitution of Language Character Repertoire for “language”.) Note that in the rest of this document "Table" and "Variant Table" are not used as short forms for Language Variant Table, as they are in RFC 3743. Note also that it is logically possible a
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).
Language Variant Table: The key mechanisms of this specification utilize a three-column table, called a Language Variant Table, for each language permitted to be registered in the zone. Those columns are known, respectively, as "Valid Code Point", "Preferred Variant", and "Character Variant", which are defined separately below. The Language Variant Tables are critical to the success of the guideline described in this document. However, the principles to be used to generate the tables are not within the scope of this document and should be worked out by each registry separately (perhaps by adopting or adapting the work of some other registry). In this document, "Table" and "Variant Table" are used as short forms for Language Variant Table.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - LANGUAGES

Human communication protocols.
(source: ALFA, author: JFC Morfin, text listed to be reviewed and possibly changed).

[M] - Latin characters

"Latin characters" is a not-precise term for characters historically related to ancient Greek script as modified in the Roman Republic and Empire and currently used throughout the world.
The base Latin characters are a subset of the ASCII repertoire and have been augmented by many single and multiple diacritics and quite a few other characters. ISO/IEC 10646 encodes the Latin characters in including ranges U+0020..U+024F, and U+1E00..U+1EFF.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Because "Latin characters" is used in different contexts to refer to the letters from the ASCII repertoire, the subset of those characters used late in the Roman Republic period or the different subset used to write Latin in medieval times, the entire ASCII repertoire, all of the code points in the extended Latin script as defined by Unicode, and other collections, the term should be avoided in IETF specifications when possible. Similarly, "Basic Latin" should not be used as a synonym for "ASCII".
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - LDH label

The classical label form used in the DNS and most applications that call on it, albeit with some additional restrictions, reflects the early syntax of "hostnames" [RFC0952] and limits those names to ASCII letters, digits, and embedded hyphens. The hostname syntax is identical to that described as the "preferred name syntax" in Section 3.5 of RFC 1034 [RFC1034] as modified by RFC 1123 [RFC1123]. LDH labels are defined in a more restrictive and precise way for internationalization contexts as part of the IDNA2008 specification [RFC5890].
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
This is the classical label form used, albeit with some additional restrictions, in hostnames [RFC0952]. Its syntax is identical to that described as the "preferred name syntax" in Section 3.5 of RFC 1034 [RFC1034] as modified by RFC 1123 [RFC1123]. Briefly, it is a string consisting of ASCII letters, digits, and the hyphen with the further restriction that the hyphen cannot appear at the beginning or end of the string. Like all DNS labels, its total length must not exceed 63 octets.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
LDH labels include the specialized labels used by IDNA (described as "A-labels" below) and some additional restricted forms (also described below).
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - LDML

(See Locale Data Markup Language.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Leading Consonant

(1) In Korean, a jamo character with the Hangul_Syllable_Type property value Leading_Jamo (in the range U/1100..U/1159 or U/115F hangul choseong filler). Abbreviated as L. (See definition D107 in Section 3.12, Conjoining Jamo Behavior.) (2) Any initial consonant in a syllable.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Leading Surrogate

Synonym for high-surrogate code unit.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Letter

(1) An element of an alphabet. In a broad sense, it includes elements of syllabaries and ideographs. (2) Informative property of characters that are used to write words.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - letters

The term "letters" does not have an exact equivalent in the Unicode standard. Letters are generally characters that are used to write words, but that means very different things in different languages and cultures.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Letters-Digits-Hyphen (LDH)

The set of permissable characters in a domain label, when applying hostname rules.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Ligature

A glyph representing a combination of two or more characters. In the Latin script, there are only a few in modern use, such as the ligatures between “f” and “i” or “f” and “l”. Other scripts make use of many ligatures, depending on the font and style.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Lingualization

Strings from a single language are used in protocols.
(source: IUCG, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - local and regional standards organizations

Just as there are many native CCSs and charsets, there are many local and regional standards organizations to create and support them. Common examples of these are ANSI (United States), CEN/ISSS (Europe), JIS (Japan), and SAC (China).
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - local Internet community

The community of Internet users within a country who benefit from the country’s top-level domain. Country-code top-level domains are delegated to sponsoring organisations to operate domains in the best interests of this community, particularly by implementing policies the community has developed.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - locale

Locale is the user-specific location and cultural information managed by a computer. <NONE> Because languages and orthographic conventions differ from country to country (and even region to region within a country), the locale of the user can often be an important factor. Typically, the locale information for a user includes the language(s) used.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Locale issues go beyond character use, and can include things such as the display format for currency, dates, and times. Some locales (especially the popular "C" and "POSIX" locales) do not include language information.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
It should be noted that there are many thorny, unsolved issues with locale. For example, should text be viewed using the locale information of the person who wrote the text or the person viewing it? What if the person viewing it is travelling to different locations? Should only some of the locale information affect creation and editing of text?
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Locale Data Markup Language

The XML specification for the exchange of locale data, defined by Unicode Technical Standard #35, \\\"\\\"Unicode Locale Data Markup Language (LDML).\\\"\\\" (See also Common Locale Data Repository.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - localization

The process of adapting an internationalized application platform or application to a specific cultural environment. In localization, the same semantics are preserved while the syntax may be changed. [FRAMEWORK] Localization is the act of tailoring an application for a different language or script or culture. Some internationalized applications can handle a wide variety of languages. Typical users only understand a small number of languages, so the program must be tailored to interact with users in just the languages they know.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The major work of localization is translating the user interface and documentation. Localization involves not only changing the language interaction, but also other relevant changes such as display of numbers, dates, currency, and so on. The better internationalized an application is, the easier it is to localize it for a particular language and character encoding scheme.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Localization is rarely an IETF matter, and protocols that are merely localized, even if they are serially localized for several locations, are generally considered unsatisfactory for the global Internet.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Do not confuse "localization" with "locale", which is described in Section 8 of this document.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Logical Order

The order in which text is typed on a keyboard. For the most part, logical order corresponds to phonetic order. (See Section 2.2, Unicode Design Principles.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Logical Store

Memory representation.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Logosyllabary

A writing system in which the units are used primarily to write words and/or morphemes of words, with some subsidiary usage to represent just syllabic sounds. The best example is the Han script.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Lowercase

(See case.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Low-Surrogate Code Point

A Unicode code point in the range U/DC00 to U/DFFF. (See definition D73 in Section 3.8, Surrogates.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Low-Surrogate Code Unit

A 16-bit code unit in the range DC0016 to DFFF16, used in UTF-16 as the trailing code unit of a surrogate pair. Also known as a trailing surrogate. (See definition D74 in Section 3.8, Surrogates.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - LSB

Acronym for least significant byte.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - LZW

Acronym for Lempel-Ziv-Welch, a standard algorithm widely used for compression of data.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Majuscule

Synonym for uppercase. (See case.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Mathematical Property

Informative property of characters that are used as operators in mathematical formulae.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Matra

A dependent vowel in an Indic script. It is the name for vowel letters that follow consonant letters in logical order. A matra often has a completely different letterform from that for the same phonological vowel used as an independent letter.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - MAY

This word, or the adjective "OPTIONAL", mean that an item is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item. An implementation which does not include a particular option MUST be prepared to interoperate with another implementation which does include the option, though perhaps with reduced functionality. In the same vein an implementation which does include a particular option MUST be prepared to interoperate with another implementation which does not include the option (except, of course, for the feature the option provides.)
(source: RFC 2119, author: S. Bradner, text listed to be reviewed and possibly changed).

[M] - MBCS

Abbreviation for multibyte character set.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Metacommunications

extended services (what you receive is what the network was commanded to deliver) intelligently managed on a fringe to fringe basis through content embedded or parallel metadata exchanges. They are provided by smart local operational tasks (slots) as layer 10 supporting layer 11 as the facilitation agent. The user is layer 12, relational spaces are layer 13, and layer 14 is the world digital ecosystem (WDE).
(source: ALFA/OSEX, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - MIME type

A formalised text string that identifies the type of a file that is included in the headers of an email or web transmission. IANA maintains the registry of MIME types.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Minimal Well-Formed Code Unit Subsequence

A well-formed Unicode code unit sequence that maps to a single Unicode scalar value. (See definition D85a in Unicode 5.1.0.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Minuscule

Synonym for lowercase. (See case.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Mirrored Property

The property of characters whose images are mirrored horizontally in text that is laid out from right to left (versus from left to right). (See Section 4.7, Bidi Mirrored—Normative.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Missing Glyph

(See replacement glyph.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - ML-DNS

This is an architectural concept that encapsulates the DNS, into the IDNS, into the IUDNS. Its purpose is to resolve any DN, IDN, or IUDN string, as part of the DN pile of all its possible polynyms, into its addresses in the world digital ecosystem.
(source: IUTF, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - Modifier Letter

A character with the Lm General Category in the Unicode Character Database. Modifier letters, which look like letters or punctuation, modify the pronunciation of other letters (similar to diacritics). (See Section 7.8, Modifier Letters.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Monotonic

Modern Greek written with the basic accent, the tonos.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Mora

A phonological term- the unit of sound which determines syllable weight in some languages. Some syllabaries have characteristics which reflect moraic structure more or less exactly. In particular, the Japanese kana syllabaries actually write one character per mora, rather than one character per syllable. The Vai syllabary also counts final nasals as distinct moras, and writes moras instead of syllables.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - MSB

Acronym for most significant byte.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Multibyte Character Set

A character set encoded with a variable number of bytes per character, often abbreviated as MBCS. Many large character sets have been defined as MBCS so as to keep strict compatibility with the ASCII subset and/or ISO/IEC 2022.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - multilingual

The term "multilingual" has many widely-varying definitions and thus is not recommended for use in standards. Some of the definitions relate to the ability to handle international characters; other definitions relate to the ability to handle multiple charsets; and still others relate to the ability to handle multiple languages.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Multilingualization

Every language is technically and socially treated the same.
(source: ALFA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - MUST

This word, or the terms "REQUIRED" or "SHALL", mean that the definition is an absolute requirement of the specification.
(source: RFC 2119, author: S. Bradner, text listed to be reviewed and possibly changed).

[M] - MUST NOT

This phrase, or the phrase "SHALL NOT", mean that the definition is an absolute prohibition of the specification.
(source: RFC 2119, author: S. Bradner, text listed to be reviewed and possibly changed).

[M] - Name aliasing:

Refers to the abstract concept of two or more domain names "behaving as one". The concept might be implemented in multiple ways by Policy or Technical means. None of which have been agreed upon as completely successful. {This is by all means incomplete feel free to improve it.}
(source: ICANN, author: WG/VIP, text listed to be reviewed and possibly changed).

[M] - name server

See domain name server.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - NameClass

a sequence of letters, numbers, and symbols that is used to identify or address a network entity such as a user, an account, a venue (e.g., a chatroom), an information source (e.g., a data feed), or a collection of data (e.g., a file).
(source: draft-blanchet-precis-framework-03, author: M. Blanchet, P. Saint-Andre, text listed to be reviewed and possibly changed).

[M] - Named Unicode Algorithm

A Unicode algorithm that is specified in the Unicode Standard or in other standards published by the Unicode Consortium and that is given an explicit name for ease of reference. (See definition D18 in Section 3.4, Characters and Encoding. See also Table 3-1, “Named Unicode Algorithms,” for a list of named Unicode algorithms.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - NAT

see Network Address Translation.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Nekudot

Marks that indicate vowels or other modifications of consonantal letters in Hebrew.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - network address translation (NAT)

A system of using private IP addresses within an internal network (such as within a home, and office, or even within an ISP), and then having those numbers converted into a real IP address when Internet traffic leaves that network using a specialised router. This is commonly used within homes, for example, so that users do not have to apply for an extra IP address each time they connect a device to the network. It is very similar to using “extension numbers” within an office telephone system.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Neutral Character

A character that can be written either right to left or left to right, depending on context. (See Unicode Standard Annex #9, “Unicode Bidirectional Algorithm.”)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - NFC

(See Normalization Form C.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - NFD

(See Normalization Form D.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - NFK

[M] - NFKC

(See Normalization Form KC.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - NFKD

(See Normalization Form KD.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - non-ASCII

The term "non-ASCII" strictly refers to characters other than those that appear in the ASCII repertoire, independent of the CCS or encoding used for them. In practice, if a repertoire such as that of Unicode is established as context, "non-ASCII" refers to characters in that repertoire that do not appear in the ASCII repertoire. "Outside the ASCII repertoire" and "outside the ASCII range" are practical, and more precise, synonyms for "non-ASCII".
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Noncharacter

A code point that is permanently reserved for internal use and that should never be interchanged. Noncharacters consist of the values U/nFFFE and U/nFFFF (where n is from 0 to 1016), and the values U/FDD0..U/FDEF.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Non-joiner

[M] - Non-overridable

A characteristic of a Unicode character property that cannot be changed by a higher-level protocol.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - nonspacing character

A combining character whose positioning in presentation is dependent on its base character. It generally does not consume space along the visual baseline in and of itself. <UNICODE> A combining acute accent (U+0301) is an example of a nonspacing character.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Nonspacing Diacritic

A diacritic that is a nonspacing mark.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Nonspacing Mark

A combining character with the General Category of Nonspacing Mark (Mn) or Enclosing Mark (Me). (See definition D53 in Section 3.6 Combination.) The position of a nonspacing mark in presentation depends on its base character. It generally does not consume space along the visual baseline in and of itself. (See also combining character.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Non-starter Decomposition

A canonical decomposition mapping to a sequence of more than one character, for which the first character in that sequence is not a Starter. (Used in the definition of Unicode Normalization Forms.) (See definition D111 in Section 3.11, Normalization Forms.)
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - normalization

Normalization is the transformation of data to a normal form, for example, to unify spelling. <UNICODE> Note that the phrase "unify spelling" in the definition above does not mean unifying different strings with the same meaning as words (such as "color" and "colour"). Instead, it means unifying different character sequences that are intended to form the same composite characters. such as "<n><combining tilde>" and "<n with tilde>" (where "<n>" is U+006E, "<combining tilde>" is U+0303, and "<n with tilde>" is U+00F1.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The purpose of normalization is to allow two strings to be compared for equivalence. The strings "<a><n><combining tilde><o>" and "<a><n with tilde><o>" would be shown identically on a text display device. If a protocol designer wants those two strings to be considered equivalent during comparison, the protocol must define where normalization occurs.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The terms "normalization" and "canonicalization" are often used interchangeably. Generally, they both mean to convert a string of one or more characters into another string based on standardized rules. However, in Unicode, "canonicalization" or its variants are used to refer to a particular type of normalization equivalence ("canonical equivalence") in contrast to "compatibility equivalence"), so the term should be used with some care. Some CCSs allow multiple equivalent representations for a written string; normalization selects one among multiple equivalent representations as a base for reference purposes in comparing strings. In strings of text, these rules are usually based on decomposing combined characters or composing characters with combining characters. Unicode Standard Annex #15 [UTR15] describes the process and many forms of normalization in detail.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Normalization is important when comparing strings to see if they are the same.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The Unicode NFC and NFD normalizations support canonical equivalence; NFKC and NFKD support canonical and compatibility equivalence.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
A process of removing alternate representations of equivalent sequences from textual data, to convert the data into a form that can be binary-compared for equivalence. In the Unicode Standard, normalization refers specifically to processing to ensure that canonical-equivalent (and/or compatibility-equivalent) strings have unique representations. For more information, see “Equivalent Sequences” in Section 2.2, Unicode Design Principles, and Unicode Standard Annex #15, \\\"\\\"Unicode Normalization Forms.\\\"\\\"
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Normalization Form

One of the four Unicode normalization forms defined in Unicode Standard Annex #15, \\\"\\\"Unicode Normalization Forms\\\"\\\"—namely, NFC, NFD, NFKC, and NFKD.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Normalization Form C (NFC)

A normalization form that erases any canonical differences, and generally produces a composed result. For example, a + umlaut is converted to ä in this form. This form most closely matches legacy usage. The formal definition is D120 in Section 3.11, Normalization Forms.
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).
The normalization form that results from the canonical decomposition of a Unicode string, followed by the replacement of all decomposed sequences by primary composites where possible.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Normalization Form D (NFD)

A normalization form that erases any canonical differences, and produces a decomposed result. For example, ä is converted to a + umlaut in this form. This form is most often used in internal processing, such as in collation. The formal definition is D118 in Section 3.11, Normalization Forms.
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).
The normalization form that results from the canonical decomposition of a Unicode string.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Normalization Form KC (NFKC)

A normalization form that erases both canonical and compatibility differences, and generally produces a composed result for example, the single ? character is converted to d + ž in this form. This form is commonly used in matching. The formal definition is D121 in Section 3.11, Normalization Forms.
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).
The normalization form that results from the compatibility decomposition of a Unicode string, followed by the replacement of all decomposed sequences by primary composites where possible.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Normalization Form KD (NFKD)

A normalization form that erases both canonical and compatibility differences, and produces a decomposed result for example, the single ? character is converted to d + z + caron in this form. The formal definition is D119 in Section 3.11, Normalization Forms.
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).
The normalization form that results from the compatibility decomposition of a Unicode string.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Normative

Required for conformance with the Unicode Standard.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - NR-LDH label

Non-Reserved LDH labels are the set of valid LDH labels that do not have "--" in the third and fourth positions.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - NS record

a type of record in a DNS zone that signifies part of that zone is delegated to a different set of authoritative name servers. Operators of domain names must have their authoritative name servers correctly listed in the parent domain.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - NSM

Acronym for nonspacing mark.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - number resources

Used to describe the hierarchically assigned number resources used for Internet routing, namely IP addresses and autonomous system numbers. These are usually distributed through regional Internet registries.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Numeric Value Property

A property of characters used to represent numbers. (See Section 4.6, Numeric Value—Normative)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - object identifier

see Private Enterprise Number.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Obsolete

Applies to a character that is no longer in current use, but that has been used historically. Whether a character is obsolete depends on context- For example, the Cyrillic letter big yus is obsolete for Russian, but is used in modern Bulgarian. (Not the same as deprecated.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - onal Sandhi

[M] - on-the-wire encoding

The encoding and decoding used before and after transmission over the network is often called the "on-the-wire" (or sometimes just "wire") format. <NONE> Characters are identified by code points. Before being transmitted in a protocol, they must first be encoded as bits and octets. Similarly, when characters are received in a transmission, they have been encoded, and a protocol that needs to process the individual characters needs to decode them before processing.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - orthotypography

The term ‘orthotypography’ is seen from the viewpoint of readability. It means the correct use of typographic signs to convey the intended semantic or context. Also known as "Typographical syntax", orthotypography defines the meaning and rightful usage of typographic signs and cases. Orthotypographic rules may vary broadly from language to language, from country to country, etc.
(source: IUCG, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[R] - out-of-band

An out-of-band channel conveys additional information about text in such a way that the textual content, as encoded, is completely untouched and unmodified. This is typically done by separate data structures that point into the text.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Overridable

A characteristic of a Unicode character property that may be changed by a higher-level protocol to create desired implementation effects.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Oxia

Greek term for acute accent, used in polytonic Greek character names
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Paragraph Direction

The default direction (left or right) of the text of a paragraph. This direction does not change the display order of characters within an Arabic or English word. However, it does change the display order of adjacent Arabic and English words, and the display order of neutral characters, such as punctuation and spaces. For more details, see Unicode Standard Annex #9, “Unicode Bidirectional Algorithm,” especially definitions BD2–BD5.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Paragraph Embedding Level

The embedding level that determines the default bidirectional orientation of the text in that paragraph.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - parent domain

the domain above a domain in the DNS hierarchy. For all top-level domains, the Root Zone is the parent domain. The Root Zone has no parent domain as it is as the top of the hierarchy. Opposite of sub-domain.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - parsed text

Text strings that is analyzed for subparts. <NONE> In some protocols, free text in text fields might be parsed. For example, many mail user agents (MUAs) will parse the words in the text of the Subject: field to attempt to thread based on what appears after the "Re:" prefix.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Such conventions are very sensitive to localization. If, for example, a form like "Re:" is altered by an MUA to reflect the language of the sender or recipient, a system that subsequently does threading may not recognize the replacement term as a delimiter string.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - PEN

[M] - Perispomeni

Greek term for circumflex accent, used in polytonic Greek character names.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Phoneme

A minimally distinct sound in the context of a particular spoken language. For example, in American English, /p/ and /b/ are distinct phonemes because pat and bat are distinct; however, the two different sounds of /t/ in tick and stick are not distinct in English, even though they are distinct in other languages such as Thai.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Pinyin

Standard system for the romanization of Chinese on the basis of Mandarin pronunciation.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Pivot Conversion

The use of a third character encoding to serve as an intermediate step in the conversion between two other character encodings. The Unicode Standard is widely used to support pivot conversion, as its character repertoire is a superset of most other coded character sets.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Plane

A range of 65,536 (1000016) contiguous ISO 10646 code points, where the first code point is an integer multiple of 65,636 (1000016). Planes are numbered from 0 to 16, with the number being the first code point of the plane divided by 65,536. Thus Plane 0 is U/0000..U/FFFF, Plane 1 is U/10000..U/1FFFF, ..., and Plane 16 (1016) is U/100000..10FFFF. (Note that ISO/IEC 10646 uses hexadecimal notation for the plane numbers—for example, Plane B instead of Plane 11). (See Basic Multilingual Plane and supplementary planes.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Plurilingualization

Several languages can be indifferetntly used by the iusers.
(source: IUCG, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - Points

(1) The nonspacing vowels and other signs of written Hebrew. (2) A unit of measurement in typography.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Policy Development Process (PDP)

The formal policy creation process employed by ICANN by a number of its constituencies.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - polynym

alternative name of an address in the same or another language.
(source: IUCG, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - Polytonic

Ancient Greek written with several contrastive accents.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - port number

A number used for identifying the type of Internet traffic being transmitted between two computers over the Internet. For example, the web uses port 80, DNS uses port 53, and email uses port 25. IANA assigns these numbers, and it is one of the more high profile protocol registries IANA maintains.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Postel, Jon

The progenitor of IANA. A computer scientist responsible for IANA until 1998, initially individually and later with other IANA staff within the University of Southern California. He was also responsible for the RFC Editor.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Precomposed Character

(See decomposable character.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Preferred Variant

In a Language Variant Table, a list of Code Points corresponding to each Valid Code Point and providing possible substitutions for it. These substitutions are "preferred" in the sense that the variant labels generated using them are normally registered in the zone file, or "activated." The Preferred Code Points appear in column 2 of the Language Variant Table. "Preferred Code Point" is used interchangeably with this term. (RFC 3743)
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).
Preferred Variant: In a Language Variant Table, a list of Code Points corresponding to each Valid Code Point and providing possible substitutions for it. These substitutions are "preferred" in the sense that the variant labels generated using them are normally registered in the zone file, or "activated." The Preferred Code Points appear in column 2 of the Language Variant Table. "Preferred Code Point" is used interchangeably with this term.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - Preferred Variant Label

A U-label generated by use of Preferred Variants. The Preferred Variant Label must contain at least one Preferred Variant, but need not contain all the Preferred Variants possible for the Fundamental Label. This definition differs from that in RFC 3743 by specifying “U-label” rather than “label”.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).
Preferred Variant Label: A label generated by use of Preferred Variants (or Preferred Code Points).
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - Preferred Variant TLD

The Preferred Variant Label form(s) of a Variant TLD Set.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).

[M] - Presentation Form

A ligature or variant glyph that has been encoded as a character for compatibility. (See also compatibility character (1).)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Primary Composite

A Canonical Decomposable Character which is not a Full Composition Exclusion. (Used in the definition of Unicode Normalization Forms.) (See definition D114 in Section 3.11, Normalization Forms.)
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).
A character that has a canonical decomposition mapping in the Unicode Character Database (or is a canonical Hangul decomposition) but that is not in the Composition Exclusion Table. (See Unicode Standard Annex #15, \\\"\\\"Unicode Normalization Forms.\\\"\\\")
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Principles for the Delegation and Administration of ccTLDs

See GAC Principles.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - private enterprise numbers (PENs)

A unique numbering system used by several different Internet protocols (such as SNMP and LDAP) that use Abstract Notation Syntax One (ASN.1). It can be used to label services within an organisation. A company may apply for a private enterprise number to use these numbering systems without conflicting with other companies and users. A subset of numbers known as an Object Identifiers, or OIDs.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - private IP addresses

A set of IP addresses only used within private networks, and therefore not reachable from the global Internet. Commonly used within home or office networks in conjuction with network address translation, which converts private IP addresses into a valid IP address when data leaves the local network. IANA maintains some special ranges of IP addresses solely for use as private IP addresses, as described in technical standards RFC 1918 and 3927.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Private Use

Refers to designated code points in the Unicode Standard or other character encoding standards whose interpretations are not specified in those standards and whose use may be determined by private agreement among cooperating users.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Private Use Area (PUA)

Any one of the three blocks of private-use code points in the Unicode Standard.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Private-Use Code Point

Code points in the ranges U/E000..U/F8FF, U/F0000..U/FFFFD, and U/100000..U/10FFFD. (See definition D49 in Section 3.5, Properties.) These code points are designated in the Unicode Standard for private use.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Productive

Said of a feature or rule that can be employed in novel combinations or circumstances, rather than being restricted to a fixed list. In the Unicode Standard, combining marks—particularly the accents—are productive. In contrast, variation selectors are deliberately not productive. Also known as generative.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Property

(See character properties.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Property Alias

A unique identifier for a particular Unicode character property. (See definition D47 in Section 3.5, Properties.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Property Value Alias

A unique identifier for a particular enumerated value for a particular Unicode character property. (See definition D48 in Section 3.5, Properties.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Prosgegrammeni

Greek term for adscript iota, used in polytonic Greek character names.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[R] - protocol

[M] - PROTOCOL

A formal description of messages to be exchanged and rules to be followed for two or more systems to exchange information.
(source: , author: , text listed to be reviewed and possibly changed).
A formal description of message formats and the rules two computers must follow to exchange those messages. Protocols can describe low-level details of machine-to-machine interfaces (e.g., the order in which bits and bytes are sent across a wire) or high-level exchanges between allocation programs (e.g., the way in which two programs transfer a file across the Internet). [Source: MALAMUD]
(source: , author: , text listed to be reviewed and possibly changed).
1a. (I) A set of rules (i.e., formats and procedures) to implement and control some type of association (e.g., communication) between systems. Example: Internet Protocol.
1b. (I) A series of ordered computing and communication steps that are performed by two or more system entities to achieve a joint objective. [A9042]
(source: , author: , text listed to be reviewed and possibly changed).
Any form of inter-computer communication that has been standardised to ensure computers can communicate to one another. Internet protocols are usually standardised in RFCs.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).
What document data interchanges.
(source: ALFA, author: JFC Morfin, text listed to be reviewed and possibly changed).

[M] - protocol assignments

The assignment of protocol parameters by IANA.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - protocol elements

Protocol elements are uniquely-named parts of a protocol. <NONE> Almost every protocol has named elements, such as "source port" in TCP. In some protocols, the names of the elements (or text tokens for the names) are transmitted within the protocol. For example, in SMTP and numerous other IETF protocols, the names of the verbs are part of the command stream. The names are thus part of the protocol standard. The names of protocol elements are not normally seen by end users and it is rarely appropriate to internationalize protocol element names (even while the elements themselves can be internationalized).
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
name spaces A name space is the set of valid names for a particular item, or the syntactic rules for generating these valid names. Many items in Internet protocols use names to identify specific instances or values. The names may be generated (by some prescribed rules), registered centrally (e.g., such as with IANA), or have a distributed registration and control mechanism, such as the names in the DNS.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - protocol parameters

Unique systems of numbering or encoding used by a protocols that must be consistently applied for the protocols to be interoperable. The global unique assignment of protocol parameters is the task of IANA.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - protocol registry

An individual protocol parameter registry managed by IANA, usually tied to a specific Internet standard.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Provisional

A property or feature that is unapproved and tentative, and that may be incomplete or otherwise not in a usable state.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Psili

Greek term for smooth breathing mark, used in polytonic Greek character names.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - PTR record

The representation of a IP address to domain name mapping in the DNS system.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - PUA

Acronym for Private Use Area.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Pulli

The Tamil name for virama. (See virama.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - punctuation

Characters that separate units of text, such as sentences and phrases, thus clarifying the meaning of the text. The use of punctuation marks is not limited to prose; they are also used in mathematical and scientific formulae, for example.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Punycode

Punycode [RFC 3492] is thus a mechanism for encoding a Unicode string in an ASCII-compatible encoding, i.e., using only letters, digits, and hyphens from the ASCII character set. When a Unicode label that is valid under the IDNA rules (a U-label) is encoded with Punycode for IDNA purposes, it is prefixed with "xn--"; the result is called an A-label. The prefix convention assumes that no other DNS labels (at least no other DNS labels in IDNA-aware applications) are allowed to start with these four characters. Consequently, when A-label encoding is assumed, any DNS labels beginning with "xn--" now have a different meaning (the Punycode encoding of a label containing one or more non-ASCII characters) or no defined meaning at all (in the case of labels that are not IDNA-compliant, i.e., are not well-formed A-labels).
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).
This is the name of the algorithm [RFC3492] used to convert otherwise-valid IDN labels from native-character strings expressed in Unicode to an ASCII-compatible encoding (ACE).
Strictly speaking, the term applies to the algorithm only. In practice, it is widely, if erroneously, used to refer to strings that the algorithm encodes.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Radical

A structural component of a Han character conventionally used for indexing. The traditional number of such radicals is 214.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - recursive name server

A domain name server configured to perform DNS lookups on behalf of other computers. This is often configured at corporate network boundaries and ISPs for their network customers to use. As an individual domain name lookup can often involve multiple queries to different servers, these name servers do these iterative lookups and only provide back to the computer the final answer. They are often combined with the functions of a caching name server to improve network performance, and therefore are also known as caching resolvers.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - redelegation

The transfer of a delegation from one entity to another. Most commonly used to refer to the redelegation process used for top-level domains.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Redelegation process

A special type of root zone change where there is a significant change involving the transfer of operations of a top-level domain to a new entity. Such a change must be evaluated by ICANN staff to ensure that the new entity meets a number of criteria, and must be voted on and agreed by the ICANN Board of Directors.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Regional Internet Registry (RIR)

A registry responsible for allocation of IP address resources within a particular region. There are five RIRs, and within each region network operators apply to their RIR to get IP address blocks allocated.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - registrant

The entity that has acquired the right to use an Internet resource. Usually this is via some form of revocable grant given by a registrar to list their registration in a registry.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - registrar

An entity that can act on requests from a registrant in making changes in a registry. Usually the registrar is the same entity that operates a registry, although for domain names this role is often split to allow for competition between multiple registrars who offer different levels of support. See also domain name registrar.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Registration

Registration: In this document, the term "registration" refers to the process by which a potential domain name holder requests that a label be placed in the DNS either as an individual name within a domain or as a subdomain delegation from another domain name holder. In the case of a successful registration, the label or delegation records are placed in the relevant zone file, or, more specifically, they are "activated" or made "active" and additional IDLs may be reserved as part of an "IDL Package" (see below). The guidelines presented here are recommended for all zones, at any hierarchy level, in which CJK characters are to appear and not just domains at the first or second level.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - registry

1. The authoritative record of registrations for a particular set of data. Most often used to refer to domain name registry, but all protocol parameters that IANA maintains are also registries. 2. registry operator.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - registry operator

The entity that runs a registry.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - regular expressions

Regular expressions provide a mechanism to select specific strings from a set of character strings. Regular expressions are a language used to search for text within strings, and possibly modify the text found with other text. <NONE> Pattern matching for text involves being able to represent one or more code points in an abstract notation, such as searching for all capital Latin letters or all punctuation. The most common mechanism in IETF protocols for naming such patterns is the use of regular expressions. There is no single regular expression language, but there are numerous very similar dialects that are not quite consistent with each other.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The Unicode Consortium has a good discussion about how to adapt regular expression engines to use Unicode. [UTR18] private use ISO/IEC 10646 code points from U+E000 to U+F8FF, U+F0000 to U+FFFFD, and U+100000 to U+10FFFD are available for private use.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
This refers to code points of the standard whose interpretation is not specified by the standard and whose use may be determined by private agreement among cooperating users. <UNICODE> The use of these "private use" characters is defined by the parties who transmit and receive them, and is thus not appropriate for standardization. (The IETF has a long history of private use names for things such as "x-" names in MIME types, charsets, and languages. Most of the experience with these has been quite negative, with many implementors assuming that private use names are in fact public and long-lived.)
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Rendering

(1) The process of selecting and laying out glyphs for the purpose of depicting characters. (2) The process of making glyphs visible on a display device.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - rendering rules

A rendering rule is an algorithm that a system uses to decide how to display a string of text. <NONE> Some scripts can be directly displayed with fonts, where each character from an input stream can simply be copied from a glyph system and put on the screen or printed page. Other scripts need rules that are based on the context of the characters in order to render text for display.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Some examples of these rendering rules include:
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
  • Scripts such as Arabic (and many others), where the form of the letter changes depending on the adjacent letters, whether the letter is standing alone, at the beginning of a word, in the middle of a word, or at the end of a word. The rendering rules must choose between two or more glyphs.
    (source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
  • Scripts such as the Indic scripts, where consonants may change their form if they are adjacent to certain other consonants or may be displayed in an order different from the way they are stored and pronounced. The rendering rules must choose between two or more glyphs.
    (source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
  • Arabic and Hebrew scripts, where the order of the characters displayed are changed by the bidirectional properties of the alphabetic and other characters characters and with right-to- left and left-to-right ordering marks. The rendering rules must choose the order that characters are displayed.
    (source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
  • Some writing systems cannot have their rendering rules suitably defined using mechanisms that are now defined in the Unicode Standard. None of those languages are in active non-scholarly use today.
    (source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
  • Many systems use a special rendering rule when they lack a font or other mechanism for rendering a particular character correctly. That rule typically involves substitution of a small open box or a question mark for the missing character.
    (source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
See "undisplayable character" below.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Reorderable Pair

Two adjacent characters A and B in a coded character sequence <A, B> are a Reorderable Pair if and only if ccc(A) > ccc(B) > 0. (Used in the definition of Unicode Normalization Forms.) (See definition D108 in Section 3.11, Normalization Forms.)
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Repertoire

(See character repertoire.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - repertoire

The collection of characters included in a character set. Also called a character repertoire.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Replacement Character

A character used as a substitute for an uninterpretable character from another encoding. The Unicode Standard uses U/FFFD replacement character for this function.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Replacement Glyph

A glyph used to render a character that cannot be rendered with the correct appearance in a particular font. It often is shown as an open or black rectangle. Also known as a missing glyph. (See Section 5.3, Unknown and Missing Characters.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Reserved (Withheld) name:

A string set aside for a potential allocation to a particular registrant (or registry in the case of TLDs in the root). The name is not allocated, but could be if/when certain conditions are met.
(source: ICANN, author: WG/VIP, text listed to be reviewed and possibly changed).

[M] - Reserved Code Point

Any code point of the Unicode Standard that is reserved for future assignment. Also known as an unassigned code point. (See definition D15 in Section 3.4, Characters and Encoding, and Section 2.4, Code Points and Characters.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Reserved Variant TLD

The Character Variant Label form(s) of a Variant TLD Set.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).

[M] - reverse IP

A method of translating an IP address into a domain name, so-called as it is the opposite of a typical lookup that converts a domain name to an IP address. Utilises PTR records in the E164.ARPA zone for IPv4, and IP6.ARPA for IPv6.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - RFC 1123

see hostname.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - RFC 1591

A document written by IANA staff in 1994 describing how they manage top-level domains. The document is well-referenced as it describes some of the key principles that govern the appointment of country-code top-level domains. Compare ICP-1.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - RFC 1918

See Private IP Addresses.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - RFC 3912

[M] - RFC 3927

[M] - RFC 812

See WHOIS.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - RFC 954

[M] - RFC3066

RFC3066: A system, widely used in the Internet, for coding and representing names of languages [RFC3066]. It is based on an International Organization for Standardization (ISO) standard for coding language names [ISO639], but expands it to provide additional precision.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - RFCs

A series of Internet engineering documents describing Internet standards, as well as discussion papers, informational memorandums and best practices. Internet standards that are published in an RFC originate from the IETF. The RFC series is published by the RFC Editor.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Rich Text

Also known as styled text. The result of adding information to plain text. Examples of information that can be added include font data, color, formatting information, phonetic annotations, interlinear text, and so on. The Unicode Standard does not address the representation of rich text. It is expected that systems and applications will implement proprietary forms of rich text. Some public forms of rich text are available (for example, ODA, HTML, and SGML). When everything except primary content is removed from rich text, only plain text should remain.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - RIR

see Regional Internet Registry.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - R-LDH label

Reserved LDH labels, known as "tagged domain names" in some other contexts, have the property that they contain "--" in the third and fourth characters but which otherwise conform to LDH label rules. Only a subset of the R-LDH labels can be used in IDNA-aware applications. That subset consists of the class of labels that begin with the prefix "xn--" (case independent), but otherwise conform to the rules for LDH labels. That subset is called "XN-labels" in this set of documents. XN-labels are further divided into those whose remaining characters (after the "xn--") are valid output of the Punycode algorithm [RFC3492] and those that are not (see below). The XN-labels that are valid Punycode output are known as "A-labels" if they also meet the other criteria for IDNA-validity described below. Because LDH labels (and, indeed, any DNS label) must not be more than 63 octets in length, the portion of an XN-label derived from the Punycode algorithm is limited to no more than 59 ASCII characters.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - romanization

The transliteration of a non-Latin script into Latin characters.
Because of the widespread use of Latin characters, people have tried to represent many languages that are not based on a Latin repertoire in Latin. For example, there are two popular romanizations of Chinese: Wade-Giles and Pinyin, the latter of which is by far more common today. Many romanization systems are inexact and do not give perfect round trip mappings between the native script and the Latin characters.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - root

the most central (or all-encompassing) authority of any naming or numbering system. Usually used to refer to the domain name system root (see Root Zone). However, IANA is also the root for IP addresses, and other systems.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Root Servers

the authoritative name servers for the Root Zone. These are considered unlike regular name servers in part because they are generally the most critical and heavily-used name servers. They are also special as they are not easily replaced, as changes to them needs to be stored in every name server worldwide in a hints file.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Root Zone

The top of the domain name system hierarchy. The root zone contains all of the delegations for top-level domains, as well as the list of root servers, and is managed by IANA.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Root Zone Management

The management of the DNS Root Zone by IANA.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - rosgegrammeni

[M] - Row

A range of 256 contiguous Unicode code points, where the first code point is an integer multiple of 256. Two code points are in the same row if they share all but the last two hexadecimal digits. (See plane.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - RZM

see Root Zone Management.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - RZM Automation

A project to automate many aspects of the Root Zone Management function within IANA. Based on a software tool originally called “eIANA”.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - SAM

Acronym for Syriac abbreviation mark.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - SBCS

Acronym for single-byte character set. Any one-byte character encoding. This term is generally used in contrast with DBCS and/or MBCS.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Scalar Value

(See Unicode scalar value.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - script

A set of graphic characters used for the written form of one or more languages. Examples of scripts are Latin, Cyrillic, Greek, Arabic, and Han (the characters, often called ideographs after a subset of them, used in writing Chinese, Japanese, and Korean). RFC 2277 discusses scripts in detail.
It is common for internationalization novices to mix up the terms "language" and "script". This can be a problem in protocols that differentiate the two. Almost all protocols that are designed (or were re-designed) to handle non-ASCII text deal with scripts (the written systems) or characters, while fewer actually deal with languages.
A single name can mean either a language or a script; for example, "Arabic" is both the name of a language and the name of a script.
In fact, many scripts borrow their names from the names of languages. Further, many scripts are used for many languages; for example, the Russian and Bulgarian languages are written in the Cyrillic script. Some languages can be expressed using different scripts or were used with different scripts at different times; the Mongolian language can be written in either the Mongolian or Cyrillic scripts; Malay is primarily written in Latin script today but the earlier, Arabic-script-based, Jawa form is still in use; and a number of languages were converted from other scripts to Cyrillic in the first half of the last century, some of which have switched again more recently. Further, some languages are normally expressed with more than one script at the same time; for example, the Japanese language is normally expressed in the Kanji (Han), Katakana, and Hiragana scripts in a single string of text.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Script Table

A Script Table is a table of Unicode Code Points all having the same script property value. See Unicode Standard Annex #24.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).

[M] - Scriptio Continua

A writing style without spaces or punctuation.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - SCSU

Acronym for Standard Compression Scheme for Unicode. See Unicode Technical Standard #6, “A Standard Compression Scheme for Unicode.”
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - SCSU and BOCU-1

The Unicode Consortium has defined an encoding, SCSU [UTR6], which is designed to offer good compression for typical text. A different encoding that is meant to be MIME-friendly, BOCU-1, is described in [UTN6]. Although compression is attractive, as opposed to UTF-8, neither of these (at the time of this writing) has attracted much interest.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The compression provided as a side effect of the Punycode algorithm [RFC3492] is heavily used in some contexts, especially IDNA [RFC5890], but imposes some restrictions (See also Section 7).
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - SDNS

(O) See: Secure Data Network System.
(source: , author: , text listed to be reviewed and possibly changed).
This is the generic concept of the world digital ecosystem subsidiary domain name system.
(source: IUTF, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - SecretClass

a sequence of letters, numbers, and symbols that is used as a secret for access to some resource on a network (e.g., a password or passphrase).
(source: draft-blanchet-precis-framework-03, author: M. Blanchet, P. Saint-Andre, text listed to be reviewed and possibly changed).

[M] - secure entry point (SEP)

synonym for trust anchor.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - semiotics

the brain to brain protocol set.
(source: IUCG, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - Shaping Characters

Characters that assume different glyphic forms depending on the context.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Shift-JIS

A shifted encoding of the Japanese character encoding standard, JIS X 0208, widely deployed in PCs.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - SHOULD

This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course
(source: RFC 2119, author: S. Bradner, text listed to be reviewed and possibly changed).

[M] - SHOULD NOT

This phrase, or the phrase "NOT RECOMMENDED" mean that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label.
(source: RFC 2119, author: S. Bradner, text listed to be reviewed and possibly changed).

[M] - Singleton Decomposition

A canonical decomposition mapping to a single character other than the character itself. (Used in the definition of Unicode Normalization Forms.) (See definition D110 in Section 3.11, Normalization Forms.)
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Sinogram

Chinese character. (See ideograph.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - SJIS

Acronym for Shift-JIS.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - slash [number]

(e.g. /24) See IP address block.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Small Letter

Synonym for lowercase letter. (See case.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Sorting

[M] - sorting and collation

Collating is the process of ordering units of textual information.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Collation is usually specific to a particular language or even to a particular application or locale. It is sometimes known as alphabetizing, although alphabetization is just a special case of sorting and collation. <UNICODE> Collation is concerned with the determination of the relative order of any particular pair of strings, and algorithms concerned with collation focus on the problem of providing appropriate weighted keys for string values, to enable binary comparison of the key values to determine the relative ordering of the strings.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The relative orders of letters in collation sequences can differ widely based on the needs of the system or protocol defining the collation order. For example, even within ASCII characters, there are two common and very different collation orders: "A, a, B, b,..." and "A, B, C, ..., Z, a, b,...", with additional variations for lower case first and digits before and after letters.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
In practice, it is rarely necessary to define a collation sequence for characters drawn from different scripts, but arranging such sequences so as to not surprise users is usually particularly problematic.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Sorting is the process of actually putting data records into specified orders, according to criteria for comparison between the records. Sorting can apply to any kind of data (including textual data) for which an ordering criterion can be defined. Algorithms concerned with sorting focus on the problem of performance (in terms of time, memory, or other resources) in actually putting the data records into the desired order.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
A sorting algorithm for string data can be internationalized by providing it with the appropriate collation-weighted keys corresponding to the strings to be ordered.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Many processes have a need to order strings in a consistent (sorted) sequence. For only a few CCS/CES combinations, there is an obvious sort order that can be applied without reference to the linguistic meaning of the characters: the code point order is sufficient for sorting. That is, the code point order is also the order that a person would use in sorting the characters. For many CCS/CES combinations, the code point order would make no sense to a person and therefore is not useful for sorting if the results will be displayed to a person.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Code Point order is usually not how any human educated by a local school system expects to see strings ordered; if one orders to the expectations of a human, one has a language-specific sort.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Sorting to code point order will seem inconsistent if the strings are not normalized before sorting because different representations of the same character will sort differently. This problem may be smaller with a language-specific sort.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Spacing Mark

A combining character that is not a nonspacing mark. (See definition D55 in Section 3.6, Combination.) (See nonspacing mark.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - sponsored top-level domain

a sub-classification of generic top-level domain, where there is a formal community of interest to domain is dedicated to serve.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - sponsoring organisation

The entity acting as the trustee of a top-level domain on behalf of its designated community. Sponsoring organisations are not assigned ownership of a domain, rather, are custodians appointed by their local Internet community to act as proper stewards in that community’s best interests. The Sponsoring Organisation can generally be re-assigned if the local Internet community wishes using the redelegation process.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Standard Korean Syllable Block

(See Korean syllable block.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Starter

Any code point (assigned or not) with combining class of zero (ccc=0). (Used in the definition of Unicode Normalization Forms.) (See definition D107 in Section 3.11, Normalization Forms.)
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Static Form

[M] - STD 3

[M] - String Classes

IDNA2008 essentially defines a base string class of "internationalized domain name" to prepare domain names and hostnames, although it does not use the term "string class".
(source: draft-blanchet-precis-framework-03, author: M. Blanchet, P. Saint-Andre, text listed to be reviewed and possibly changed).

[M] - Stringprep

Stringprep [RFC3454] provides a model and character tables for preparing and handling internationalized strings. It was used in the original IDN specification (IDNA2003) via a profile called "Nameprep" [RFC3491]. It is no longer in use in IDNA, but continues to be used in profiles by a number of other protocols.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Styled Text

[M] - sub-domain

A domain that resides within another domain. For example, “www.icann.org" is a sub-domain of “icann.org”, and “icann.org” is a sub-domain of “org”. Sub-domains are entrusted to other entities through a process of delegation.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Subtending Mark

A format character whose graphic form extends under a sequence of following characters—for example, U/0600 arabic number sign.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Supplementary Character

A Unicode encoded character having a supplementary code point.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Supplementary Code Point

A Unicode code point between U/10000 and U/10FFFF.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Supplementary Planes

Planes 1 through 16, consisting of the supplementary code points.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Surrogate Character

A misnomer. It would be an encoded character having a surrogate code point, which is impossible. Do not use this term.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Surrogate Code Point

A Unicode code point in the range U/D800..U/DFFF. Reserved for use by UTF-16, where a pair of surrogate code units (a high surrogate followed by a low surrogate) “stand in” for a supplementary code point.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Surrogate Pair

A representation for a single abstract character that consists of a sequence of two 16-bit code units, where the first value of the pair is a high-surrogate code unit, and the second is a low-surrogate code unit. (See definition D75 in Section 3.8, Surrogates.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Syllable

(1) An element of a syllabary. (2) A basic unit of articulation that corresponds to a pulmonary pulse.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Syllable Block

A sequence of Korean characters that should be grouped into a single square cell for display. (See Section 3.12, Conjoining Jamo Behavior.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - symbol

One of a set of characters other than those used for letters, digits, or punctuation, and representing various concepts generally not connected to written language use per se.<NONE> Examples of symbols include characters for mathematical operators, symbols for OCR, symbols for box-drawing or graphics, as well as symbols for dingbats, arrows, faces, and geometric shapes.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
Unicode has a property that identifies symbol characters.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Symmetric Swapping

The process of rendering a character with a mirrored glyph when its resolved directionality is right-to-left in a bidirectional context. (See mirrored property and Unicode Standard Annex #9, “Unicode Bidirectional Algorithm.”)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Tagging

The association of attributes of text with a point or range of the primary text. The value of a particular tag is not generally considered to be a part of the “content” of the text. A typical example of tagging is to mark the language or the font for a portion of text.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Tailorable

A characteristic of an algorithm for which a higher-level protocol may specify different results than those specified in the algorithm. A tailorable algorithm without actual tailoring is also known as a default algorithm, and the results of an algorithm without tailoring are known as the default results.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Telecommunications

"Layer 0". Electrical exchanges (signals) over a plug to plug physical (copper, optics, sounds, etc.) bandwidth.
(source: ALFA/OSEX, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - TES

Acronym for transfer encoding syntax.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - TEX

Computer language designed for use in typesetting—in particular, for typesetting math and other technical material. (According to Knuth, TEX rhymes with the word blecchhh.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Text Element

A minimum unit of text in relation to a particular text process, in the context of a given writing system. In general, the mapping between text elements and code points is many-to-many. (See Chapter 2, General Structure.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Titlecase

Uppercased initial letter followed by lowercase letters in words. A casing convention often used in titles, headers, and entries, as exemplified in this glossary.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - TLD

see top-level domain.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Tonal Sandhi

A phonological process whereby the tone associated with one syllable in a tonal language influences the realization of a tone associated with a neighboring syllable.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Tone Mark

A diacritic or nonspacing mark that represents a phonemic tone. Tone languages are common in Southeast Asia and Africa. Because tones always accompany vowels (the syllabic nucleus), they are most frequently written using functionally independent marks attached to a vowel symbol. However, some writing systems such as Thai place tone marks on consonant symbols; Chinese does not use tone marks (except when it is written phonemically).
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Tonemic

Refers to the underlying, distinctive units of a tonal system in a language. Tones of a tonal language are often referred to by numbers (“tone 1,” “tone 2,” and so on), and each tone has an idealized, specific tone level or contour that is considered to be its tonemic value. The term was created by analogy with phonemic.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Tonetic

Refers to the surface, actual pitch realization of tones in a tonal system. Tonetic values are what can be directly measured by tracking pitch contours in actual speech recordings. The term was created by analogy with phonetic.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Tonos

The basic accent in modern Greek, having the form of an acute accent.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - top-level domain (TLD)

The highest level of subdivisions with the domain name system. These domains, such as “.COM” and “.UK” are delegated from the DNS Root zone. They are generally divided into two distinct categories, generic top-level domains and country-code top-level domains.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Trailing Consonant

(1) In Korean, a jamo character with the Hangul_Syllable_Type property value Trailing_Jamo (in the range U/11A8..U/11F9). Abbreviated as T. (See definition D113 in Section 3.12, Conjoining Jamo Behavior.) (2) Any final consonant in a syllable.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Trailing Surrogate

Synonym for low-surrogate code unit.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Transcoding

Conversion of character data between different character sets.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - transcription

The process of systematically writing the sounds of some passage of spoken language, generally with the use of a technical phonetic alphabet (usually Latin-based) or other systematic transcriptional orthography. Transcription also sometimes refers to the conversion of written text into a transcribed form, based on the sound of the text as if it had been spoken. <NONE> Unlike transliterations, which are generally designed to be round- trip convertible, transcriptions of written material are almost never round-trip convertible to their original form, at least without some supplemental information.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Transfer Encoding Syntax

A reversible transformation applied to text and other data to allow it to be transmitted—for example, Base64, uuencode.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Transformation Format

A mapping from a coded character sequence to a unique sequence of code units (typically bytes).
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - translation

The process of conveying the meaning of some passage of text in one language, so that it can be expressed equivalently in another language. Many language translation systems are inexact and cannot be applied repeatedly to go from one language to another to another.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - transliteration

The process of representing the characters of an alphabetical or syllabic system of writing by the characters of a conversion alphabet. Many script transliterations are exact, and many have perfect round-trip mappings. The notable exception to this is romanization, described above. Transliteration involves converting text expressed in one script into another script, generally on a letter-by-letter basis. There are many official and unofficial transliteration standards, most notably those from ISO TC 46 and the U.S. Library of Congress.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Triangulation

(See pivot conversion.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - TRIP number

see Internet Telephony Administrative Domain (ITAD).
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - trust anchor

(I) /PKI/ An established point of trust (usually based on the authority of some person, office, or organization) from which a certificate user begins the validation of a certification path. (See: apex trust anchor, path validation, trust anchor CA, trust anchor certificate, trust anchor key.)
Usage: IDOCs that use this term SHOULD state a definition for it because it is used in various ways in existing IDOCs and other PKI literature. The literature almost always uses this term in a sense that is equivalent to this definition, but usage often differs with regard to what constitutes the point of trust.
Tutorial: A trust anchor may be defined as being based on a public key, a CA, a public-key certificate, or some combination or variation of those:
- 1. A public key as a point of trust: Although a certification path is defined as beginning with a "sequence of public-key certificates", an implementation of a path validation process might not explicitly handle a root certificate as part of the path, but instead begin the process by using a trusted root key to verify the signature on a certificate that was issued by the root.
Therefore, "trust anchor" is sometimes defined as just a public key. (See: root key, trust anchor key, trusted key.)
- 2. A CA as a point of trust: A trusted public key is just one of the data elements needed for path validation; the IPS path validation algorithm [R3280] also needs the name of the CA to which that key belongs, i.e., the DN of the issuer of the first X.509 certificate to be validated on the path. (See: issue.)
Therefore, "trust anchor" is sometimes defined as either just a CA (where some public key is implied) or as a CA together with a specified public key belonging to that CA. (See: root, trust anchor CA, trusted CA.)
Example: "A public key and the name of a [CA] that is used to validate the first certificate in a sequence of certificates. The trust anchor public key is used to verify the signature on a certificate issued by a trust anchor [CA]." [SP57]
- '3. A public-key certificate as a point of trust: Besides the trusted CA's public key and name, the path validation algorithm needs to know the digital signature algorithm and any associated parameters with which the public key is used, and also any constraints that have been placed on the set of paths that may be validated using the key. All of this information is available from a CA's public-key certificate.
Therefore, "trust anchor" is sometimes defined as a public-key certificate of a CA. (See: root certificate, trust anchor certificate, trusted certificate.)
- 4. Combinations: Combinations and variations of the first three definitions are also used in the PKI literature.
Example: "trust anchor information". The IPS standard for path validation [R3280] specifies the information that describes "a CA that serves as a trust anchor for the certification path. The trust anchor information includes: (a) the trusted issuer name, (b) the trusted public key algorithm, (c) the trusted public key, and (d) optionally, the trusted public key parameters associated with the public key. The trust anchor information may be provided to the path processing procedure in the form of a self-signed certificate. The trusted anchor information is trusted because it was delivered to the path processing procedure by some trustworthy out-of-band procedure. If the trusted public key algorithm requires parameters, then the parameters are provided along with the trusted public key."
(source: , author: , text listed to be reviewed and possibly changed).
A known good cryptographic certificate that can be used to validate a chain of trust.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - trust anchor repository (TAR)

Any repository of public keys that can be used as trust anchors for validating chains of trust. See Interim Trust Anchor Repository (ITAR) for one such repository for top-level domain operators using DNSSEC.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - trustee

An entity entrusted with the operations of an Internet resource for the benefit of the wider community. In IANA circles, usually in reference to the sponsoring organisation of a top-level domain.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Typographic Interaction

Graphical application of one nonspacing mark in a position relative to a grapheme base that is already occupied by another nonspacing mark, so that some rendering adjustment must be done (such as default stacking or side-by-side placement) to avoid illegible overprinting or crashing of glyphs. (See definition D106 in Section 3.11, Canonical Ordering Behavior.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UAX

Acronym for Unicode Standard Annex.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UCA

Acronym for Unicode Collation Algorithm.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UCD

Acronym for Unicode Character Database. (See Section 4.1, Unicode Character Database.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UCS

Acronym for Universal Character Set, which is specified by International Standard ISO/IEC 10646, which is equivalent in repertoire to the Unicode Standard.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UCS-2

ISO/IEC 10646 encoding form- Universal Character Set coded in 2 octets, limited to the Basic Multilingual Plane. (See Appendix C, Relationship to ISO/IEC 10646.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UCS-2 and UCS-4

UCS-2 and UCS-4 are the two encoding forms historically defined for ISO/IEC 10646. UCS-2 addresses only the BMP. Because many useful characters (such as many Han characters) have been defined outside of the BMP, many people consider UCS-2 to be obsolete.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
UCS-4 addresses the entire range of code points from ISO/IEC 10646 (by agreement between ISO/IEC JTC1 SC2 and the Unicode Consortium, a range from 0..0x10FFFF) as 32-bit values with zero padding to the left. UCS-4 is identical to UTF-32BE (without use of a BOM (see below)); UTF-32BE is now the preferred term.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - UCS-4

ISO/IEC 10646 encoding form- Universal Character Set coded in 4 octets. (See Appendix C, Relationship to ISO/IEC 10646.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - U-label

An IDNA-valid string of Unicode Code Points, in Normalization Form C (NFC) and including at least one non-ASCII character, expressed in a standard Unicode Encoding Form (such as UTF-8). It is also subject to the constraints about permitted characters that are specified in Section 4.2 of RFC 5891 and the rules in the Sections 2 and 3 of RFC 5892, the Bidi constraints in RFC 5893 if it contains any character from scripts that are written right to left, and the IDNA Symmetry Constraint. (RFC 5890)
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).
A "U-label" is an IDNA-valid string of Unicode characters, in Normalization Form C (NFC) and including at least one non-ASCII character, expressed in a standard Unicode Encoding Form (such as UTF-8). It is also subject to the constraints about permitted characters that are specified in Section 4.2 of the Protocol document and the rules in the Sections 2 and 3 of the Tables document, the Bidi constraints in that document if it contains any character from scripts that are written right to left, and the symmetry constraint described immediately below. Conversions between U-labels and A-labels are performed according to the "Punycode" specification [RFC 3492], adding or removing the ACE prefix as needed.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The Unicode representation of an internationalised domain name, i.e. how it is shown to the end-user. Contrast with A-label.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - U-label form

U-label form is a direct representation of the Unicode characters using one of the encoding forms discussed above. This document discusses UTF-8 strings in many places. While all U-labels can be represented by UTF-8 strings, not all UTF-8 strings are valid U-labels (see Section 2.3.2 of the IDNA Definitions document [RFC 5890] for a discussion of these distinctions).
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).

[M] - Umlaut

Two horizontal dots over a letter, as in German Köpfe. The umlaut is not distinguished from the diaeresis in the Unicode character encoding. (See diaeresis.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unassigned

defines application behavior in the presence of code points that are unassigned, i.e. unknown for the version of Unicode the application is built upon.
(source: IDNA2008, author: Patrik Falström, text listed to be reviewed and possibly changed).
Code points that either are reserved for future use or are never to be used.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unassigned Character

Synonym for not assigned to an abstract character. This refers to surrogate code points, noncharacters, and reserved code points. (See Section 2.4, Code Points and Characters.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unassigned Code Point

Synonym for reserved code point.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Undesignated Code Point

[M] - undisplayable character

A character that has no displayable form. For instance, the zero-width space (U+200B) cannot be displayed because it takes up no horizontal space. Formatting characters such as those for setting the direction of text are also undisplayable. Note, however, that every character in has a glyph associated with it, and that the glyphs for undisplayable characters are enclosed in a dashed square as an indication that the actual character is undisplayable.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The property of a character that causes it to be undisplayable is intrinsic to its definition. Undisplayable characters can never be displayed in normal text (the dashed square notation is used only in special circumstances). Printable characters whose Unicode definitions are associated with glyphs that cannot be rendered on a particular system are not, in this sense, undisplayable.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Unicameral

A script that has no case distinctions. Most often used in the context of European alphabets.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UNICODE

Unicode is a list of characters (including non-spacing marks that are used to form some other characters), where each character is assigned an integer value, called a code point. In simple terms a Unicode string is a string of integer code point values in the range 0 to 1,114,111 (10FFFF in base 16). These integer code points must be encoded using some mechanism before they can be transmitted in network packets, stored in memory, stored on disk, etc. Some common ways of encoding these integer code point values in computer systems include UTF-8, UTF-16, and UTF-32. In addition to the material below, those forms and the tradeoffs among them are discussed in Chapter 2 of The Unicode Standard.
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).
A standard describing a repertoire of characters used to represent most of the worlds languages in written form. The collection of scripts used to do this is maintained by the Unicode Consortium and is constantly growing. Unicode is the basis for internationalised domain names.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).
(1) The standard for digital representation of the characters used in writing all of the world's languages. Unicode provides a uniform means for storing, searching, and interchanging text in any language. It is used by all modern computers and is the foundation for processing text on the Internet. Unicode is developed and maintained by the Unicode Consortium- http-//www.unicode.org. (2) A label applied to software internationalization and localization standards developed and maintained by the Unicode Consortium.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unicode Algorithm

The logical description of a process used to achieve a specified result involving Unicode characters. (See definition D17 in Section 3.4, Characters and Encoding.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unicode Character

Unicode Character: The term "Unicode character" is used here in reference to characters chosen from the Unicode Standard Version 3.2 [UNICODE] (and hence from ISO/IEC 10646). In this document, the characters are identified by their positions, or "Code Points." The notation U+12AB, for example, indicates the character at the position 12AB (hexadecimal) in the Unicode 3.2 table. For characters in positions above FFFF, i.e., requiring more than sixteen bits to represent, a five to eight-character string is used, such as U+112AB for the character in position 12AB of plane 1.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - Unicode Character Database

A collection of files providing normative and informative Unicode character properties and mappings. (See Chapter 4, Character Properties, and the Unicode Character Database.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unicode Collation Algorithm

Tailorable text comparison mechanism used for searching, sorting, and matching Unicode strings. See Unicode Technical Standard #10, “Unicode Collation Algorithm.”
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unicode Common Locale Data Repository

The repository of locale data in XML format maintained by the Unicode Consortium (http-//www.unicode.org/cldr/). This repository provides information needed in the localization of software products into a wide variety of languages, supplying (among other things)- date, time, number, and currency formats; sorting, searching, and matching information; and translated names for languages, territories, scripts, currencies, and time zones. (See also Locale Data Markup Language.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).
A standards development organization creating widely-used specifications related to character encoding, as well as for software internationalization and localization. Major projects are the Unicode Standard and the Unicode Locales Project, which defines repositories of standardized data needed to develop software for particular regions and cultures. The Consortium was founded in 1991, and is headquartered in Mountain View, California. Its current members include major software corporations, governments, and academic institutions. See http-//www.unicode.org.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unicode Consortium

The second important group for international character standards is the Unicode Consortium. The Unicode Consortium is a trade association of companies, governments, and other groups interested in promoting the Unicode Standard [UNICODE]. The Unicode Standard is a CCS whose repertoire and code points are identical to ISO/IEC 10646. The Unicode Consortium has added features to the base CCS which make it more useful in protocols, such as defining attributes for each character. Examples of these attributes include case conversion and numeric properties.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The actual technical and definitional work of the Unicode Consortium is done in the Unicode Technical Committee (UTC). The terms "UTC" and "Unicode Consortium" are often treated, imprecisely, as synonymous in the IETF.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
The Unicode Consortium publishes addenda to the Unicode Standard as Unicode Technical Reports. There are many types of technical reports at various stages of maturity. The Unicode Standard and affiliated technical reports can be found at <http://www.unicode.org/>.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
A reciprocal agreement between the Unicode Consortium and ISO/IEC JTC 1/SC 2 provides for ISO/IEC 10646 and The Unicode Standard to track each other for definitions of characters and assignments of code points. Updates, often in the form of amendments, to the former sometimes lag updates to the latter for a short period, but the gap has rarely been significant in recent years.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
At the time that the IETF character set policy [RFC2277] was established and the first version of this terminology specification were published, there was a strong preference in the IETF community for references to ISO/IEC 10646 (rather than Unicode) when possible. That preference largely reflected a more general IETF preference for referencing established open international standards in preference to specifications from consortia. However, the Unicode definitions of character properties and classes are not part of ISO/IEC 10646. Because IETF specifications are increasingly dependent on those definitions (for example, see the explanation in Section 4.2) and the Unicode specifications are freely available online in convenient machine-readable form, the IETF's preference has shifted to referencing the Unicode Standard. The latter is especially important when version consistency between code points (either standard) and Unicode properties (Unicode only) is required.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Unicode Encoding Form

A character encoding form that assigns each Unicode scalar value to a unique code unit sequence. The Unicode Standard defines three Unicode encoding forms- UTF-8, UTF-16, and UTF-32. (See definition D79 in Section 3.9, Unicode Encoding Forms.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unicode Encoding Scheme

A specified byte serialization for a Unicode encoding form, including the specification of the handling of a byte order mark (BOM), if allowed. (See definition D94 in Section 3.10, Unicode Encoding Schemes.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unicode Locale Data Markup Language

The XML specification for the exchange of locale data, defined by Unicode Technical Standard #35, "Unicode Locale Data Markup Language (LDML)." (See also Unicode Common Locale Data Repository.)
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unicode Scalar Value

Any Unicode code point except high-surrogate and low-surrogate code points. In other words, the ranges of integers 0 to D7FF16 and E00016 to 10FFFF16 inclusive. (See definition D76 in Section 3.9, Unicode Encoding Forms.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unicode Signature

An implicit marker to identify a file as containing Unicode text in a particular encoding form. An initial byte order mark (BOM) may be used as a Unicode signature.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unicode Standard Annex

An integral part of the Unicode Standard published as a separate document.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unicode String

A code unit sequence containing code units of a particular Unicode encoding form. (See definition D80 in Section 3.9, Unicode Encoding Forms.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).
Unicode String: "Unicode string" refers to a string of Unicode characters. The Unicode string is identified by the sequence of the Unicode characters regardless of the encoding scheme.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - Unicode Technical Note

Informative publication containing information of possible interest concerning the Unicode Standard or related topics.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unicode Technical Report

Formally approved Unicode Consortium publication containing informative technical analysis of a topic related to the Unicode Standard.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unicode Technical Standard

Formally approved specification published by the Unicode Consortium that is related to, but not part of, the Unicode Standard.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unicode Transformation Format

An ambiguous synonym for either Unicode encoding form or Unicode encoding scheme. The latter terms are now preferred.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Unification

The process of identifying characters that are in common among writing systems.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Universalization

Only figures from 0 to 9 or to F are being used in protocols.
(source: IUCG, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - unsponsored top-level domain

a sub-classification of generic top-level domain, where there is no formal community of interest.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UPA

Acronym for Uralic Phonetic Alphabet.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Uppercase

[M] - URO

Acronym for Unified Repertoire and Ordering, the original set of CJK unified ideographs used in the Unicode Standard.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - User-Defined Character

(See EUDC.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - User-Perceived Character

What everyone thinks of as a character in their script.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF

Acronym for Unicode (or UCS) Transformation Format.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-16

UTF-16 is a mechanism for encoding a Unicode code point in one or two 16-bit integers, described in detail in Sections 3.9 and 3.10 of The Unicode Standard [Unicode]. A UTF-16 string encodes a string of integer code point values that represent a string of Unicode characters.
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).
A multibyte encoding for text that represents each Unicode character with 2 or 4 bytes; it is not backward-compatible with ASCII. It is the internal form of Unicode in many programming languages, such as Java, C#, and JavaScript, and in many operating systems. More technically (1) The UTF-16 encoding form. (2) The UTF-16 encoding scheme. (3) “Transformation format for 16 planes of Group 00,” defined in Annex C of ISO/IEC 106462003; technically equivalent to the definitions in the Unicode Standard.
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).
(1) The UTF-16 encoding form. (2) The UTF-16 encoding scheme. (3) “Transformation format for 16 planes of Group 00,” defined in Annex C of ISO/IEC 10646-2003; technically equivalent to the definitions in the Unicode Standard.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-16 Encoding Form

The Unicode encoding form that assigns each Unicode scalar value in the ranges U/0000..U/D7FF and U/E000..U/FFFF to a single unsigned 16-bit code unit with the same numeric value as the Unicode scalar value, and that assigns each Unicode scalar value in the range U/10000..U/10FFFF to a surrogate pair, according to Table 3-5, “UTF-16 Bit Distribution.” (See definition D91 in Section 3.9, Unicode Encoding Forms.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-16 Encoding Scheme

The UTF-16 encoding scheme that serializes a UTF-16 code unit sequence as a byte sequence in either big-endian or little-endian formats. (See definition D98 in Section 3.10, Unicode Encoding Schemes.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-16BE

The Unicode encoding scheme that serializes a UTF-16 code unit sequence as a byte sequence in big-endian format. (See definition D96 in Section 3.10, Unicode Encoding Schemes.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-16LE

The Unicode encoding scheme that serializes a UTF-16 code unit sequence as a byte sequence in little-endian format. (See definition D97 in Section 3.10, Unicode Encoding Schemes.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-2

Obsolete name for UTF-8.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-32

UTF-32 (formerly UCS-4), also described in Sections 3.9 and 3.10 of The Unicode Standard [Unicode], is a mechanism for encoding a Unicode code point in a single 32-bit integer. A UTF-32 string is thus a string of 32-bit integer code point values, which represent a string of Unicode characters. Note that UTF-16 results in some all-zero octets when code points occur early in the Unicode sequence, and UTF-32 always has all-zero octets.
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).
The Unicode Consortium and ISO/IEC JTC 1 have defined UTF-32 as a transformation format that incorporates the integer code point value right-justified in a 32 bit field. As with UTF-16, the byte order mark (BOM) can be used and UTF-32BE and UTF-32LE are defined. UTF-32 and UCS-4 are essentially equivalent and the terms are often used interchangeably.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
A multibyte encoding for text that represents each Unicode character with 4 bytes; it is not backward-compatible with ASCII. More technically (1) The UTF-32 encoding form. (2) The UTF-32 encoding scheme.
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).
(1) The UTF-32 encoding form. (2) The UTF-32 encoding scheme.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-32 Encoding Form

The Unicode encoding form that assigns each Unicode scalar value to a single unsigned 32-bit code unit with the same numeric value as the Unicode scalar value. (See definition D90 in Section 3.9, Unicode Encoding Forms.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-32 Encoding Scheme

The Unicode encoding scheme that serializes a UTF-32 code unit sequence as a byte sequence in either big-endian or little-endian formats. (See definition D101 in Section 3.10, Unicode Encoding Schemes.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-32BE

The Unicode encoding scheme that serializes a UTF-32 code unit sequence as a byte sequence in big-endian format. (See definition D99 in Section 3.10, Unicode Encoding Schemes.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-32LE

The Unicode encoding scheme that serializes a UTF-32 code unit sequence as a byte sequence in little-endian format. (See definition D100 in Section 3.10, Unicode Encoding Schemes.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-7

Unicode (or UCS) Transformation Format, 7-bit encoding form, specified by RFC-2152.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-8

UTF-8 is a mechanism for encoding a Unicode code point in a variable number of 8-bit octets, where an ASCII code point is preserved as-is.
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).
Those octets encode a string of integer code point values, which represent a string of Unicode characters. The authoritative definition of UTF-8 is in Sections 3.9 and 3.10 of The Unicode Standard, but a description of UTF-8 encoding can also be found in RFC 3629. Descriptions and formulae can also be found in Annex D of ISO/IEC 10646-1.
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).
UTF-8 [RFC3629], is the preferred encoding for IETF protocols.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
A standard used for transmitting Unicode characters.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).
A multibyte encoding for text that represents each Unicode character with 1 to 4 bytes, and which is backward-compatible with ASCII. UTF-8 is the predominant form of Unicode in web pages. More technically (1) The UTF-8 encoding form. (2) The UTF-8 encoding scheme. (3) “UCS Transformation Format 8,” defined in Annex D of ISO/IEC 106462003, technically equivalent to the definitions in the Unicode Standard.
(source: Unicode glossary, author: undisclosed, text listed to be reviewed and possibly changed).
(1) The UTF-8 encoding form. (2) The UTF-8 encoding scheme. (3) “UCS Transformation Format 8,” defined in Annex D of ISO/IEC 10646-2003, technically equivalent to the definitions in the Unicode Standard.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-8 Encoding Form

The Unicode encoding form that assigns each Unicode scalar value to an unsigned byte sequence of one to four bytes in length, as specified in Table 3-6. (See definition D92 in Section 3.9, Unicode Encoding Forms.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTF-8 Encoding Scheme

The Unicode encoding scheme that serializes a UTF-8 code unit sequence in exactly the same order as the code unit sequence itself. (See definition D95 in Section 3.10, Unicode Encoding Schemes.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTN

Acronym for Unicode Technical Note.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTR

Acronym for Unicode Technical Report.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - UTS

Acronym for Unicode Technical Standard
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Valid

defines which code points and character categories are treated as valid input to preparation of the string.
(source: IDNA2008, author: Patrik Falström, text listed to be reviewed and possibly changed).

[M] - Valid Code Point

[M] - Valid Code Point

In a Language Variant Table, the list of Code Points that is permitted at registration time for that language. Any other Code Points, or any string containing them, will be rejected. The Valid Code Point list appears as the first column of the Language Variant Table. (RFC 3743) Note that Valid Code Points are always both Assigned Code Points and Variant Members.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).
Valid Code Point: In a Language Variant Table, the list of Code Points that is permitted for that language. Any other Code Points, or any string containing them, will be rejected by this specification. The Valid Code Point list appears as the first column of the Language Variant Table.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - Valid labels

IDNA specifies validity of a label, such as what characters it can contain, relationships among them, and so on, in Unicode terms. Valid labels can be in either "U-label" or "A-label" form, with the appropriate one determined by particular protocols or by context.
(source: RFC 6055, author: D. Thaler, J. Klensin, S. Chesire, text listed to be reviewed and possibly changed).

[M] - Varia

Greek term for grave accent, used in polytonic Greek character names.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).


[M] - Variant

In the context of internationalised domain names, an alternative domain name that can be registered, or mean the same thing, because some of its characters can be registered in multiple different ways due to the way the language works. Depending on registry policy, variants may be registered together in one block called a variant bundle. For example, “internationalise” and “internationalize” may be considered variants in English.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).
Input to a process that can be used in place of another input without affecting the outcome of that process.
(source: ALFA, author: JFC Morfin, text listed to be reviewed and possibly changed).
The differences that make no difference.
(source: ALFA, author: JFC Morfin, text listed to be reviewed and possibly changed).

[M] - variant bundle

A collection of multiple domain names that are grouped together because some of the characters are considered variants of the others.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Variant Character Collection

All the characters listed in a single row of a Language Variant Table, as any of Valid Code Point, Preferred Variant, or Character Variant. (RFC 3743) It is important to recognize that the relationship may not be reciprocal (that is, if foo is a Valid Code Point and bar is a Character Variant, that does not mean that foo is a Character Variant for Valid Code Point bar).
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).

[M] - Variant Label Set

A set of U-labels consisting of one Fundamental Label, zero or more Preferrred Variant Labels, and zero or more Character Variant Labels.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).

[M] - Variant Members

Code Points that appear in a Language Variant Table. The code point may appear in any of the Valid Code Point, Preferred Variant, or Character Variant positions.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).

[M] - variant table

A type of IDN table that describes the variants for a particular language or script. For example, a variant table may map Simplified Chinese characters to Traditional Chinese characters for the purpose of constructing a variant bundle.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Variant TLD

A Variant Domain Name Label corresponding to an A-label that appears or is intended to appear immediately below the root in the global DNS. Note that this definition includes TLDs that do not actually exist in the DNS at a given point in time. More informally, a Variant Domain Name Label that appears or intended to appear immediately below the DNS root. Because the actual labels in the DNS are all A-labels, this informal use is not strictly true; but because A-labels and U-labels are symmetric, it amounts to the same thing.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).

[M] - Variant TLD Set

A set of Variant TLDs consisting of one Fundamental Label, zero or more Preferred Variant Labels, and zero or more Character Variant Labels.
(source: ICANN, author: VIP Team, text listed to be reviewed and possibly changed).

[M] - Virama

From Sanskrit. The name of a sign used in many Indic and other Brahmi-derived scripts to suppress the inherent vowel of the consonant to which it is applied, thereby generating a dead consonant. (See Section 9.1, Devanagari.) The sign varies in shape from script to script, and may be known by other names in various languages. For example, in Hindi it is known as hal or halant, in Bangla it is called hasant, and in Tamil it is called pulli.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Visual Ambiguity

A situation arising from two characters (or sequences of characters) being rendered indistinguishably.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Visual Order

Characters ordered as they are presented for reading. (Contrast with logical order.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Vocalization

Marks placed above, below, or within consonants to indicate vowels or other aspects of pronunciation. A feature of Middle Eastern scripts.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Vowel

In Korean, a jamo character with the Hangul_Syllable_Type property value Vowel_Jamo (in the range U/1161..U/11A2 or U/1160 hangul jungseong filler). Abbreviated as V. (See definition D110 in Section 3.12, Conjoining Jamo Behavior.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Vowel Mark

In many scripts, a mark used to indicate a vowel or vowel quality.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Vrachy

Greek term for breve accent, used in polytonic Greek character names.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - W3C

(N) See: World Wide Web Consortium.
(source: , author: , text listed to be reviewed and possibly changed).
Acronym for World Wide Web Consortium.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - wchar_t

The ANSI C defined wide character type, usually implemented as either 16 or 32 bits. ANSI specifies that wchar_t be an integral type and that the C language source character set be mappable by simple extension (zero- or sign-extension).
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Well-Formed Code Unit Sequence

A code unit sequence that follows the specification of a Unicode encoding form. (See definition D85 in Section 3.9, Unicode Encoding Forms.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - WHOIS database

Used to refer to parts of a registry’s database that are made public using the WHOIS protocol, or via similar mechanisms using other protocols (such as web pages, or IRIS). Most commonly used to refer to a domain name registry’s public database.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - WHOIS gateway

An interface, usually a web-based form, that will perform a look-up to a WHOIS server. This allows one to find WHOIS information without needing a specialised computer program that speaks the WHOIS protocol.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - WHOIS protocol

[M] - WHOIS server

A system running on port number 43 that accepts queries using the WHOIS protocol.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - wire format

The format of data when it is transmitted over the Internet (i.e. “over the wire”). For example, an A-label is the wire format of an internationalised domain name; and UTF-8 is a possible wire format of Unicode.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - World Wide Web Consortium (W3C)

(N) Created in October 1994 to develop and standardize protocols to promote the evolution and interoperability of the Web, and now consisting of hundreds of member organizations (commercial firms, governmental agencies, schools, and others).
Tutorial: W3C Recommendations are developed through a process similar to that of the standards published by other organizations, such as the IETF. The W3 Recommendation Track (i.e., standards track) has four levels of increasing maturity: Working, Candidate Recommendation, Proposed Recommendation, and W3C Recommendation. W3C Recommendations are similar to the standards published by other organizations. (Compare: Internet Standard, ISO.)
(source: , author: , text listed to be reviewed and possibly changed).
This group created and maintains the standard for XML, the markup language for text that has become very popular. XML has always been fully internationalized so that there is no need for a new version to handle international text. However, in some circumstances, XML files may be sensitive to differences among Unicode versions.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - Writing Direction

The direction or orientation of writing characters within lines of text in a writing system. Three directions are common in modern writing systems- left to right, right to left, and top to bottom.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).
A set of rules for using one or more scripts to write a particular language. Examples include the American English writing system, the British English writing system, the French writing system, and the Japanese writing system.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - writing system

A set of rules for using one or more scripts to write a particular language. Examples include the American English writing system, the British English writing system, the French writing system, and the Japanese writing system. character A member of a set of elements used for the organization, control, or representation of data. There are at least three common definitions of the word "character":
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
  • a general description of a text entity
    (source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
  • a unit of a writing system, often synonymous with "letter" or similar terms, but generalized to include digits and symbols of various sorts
    (source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
  • the encoded entity itself When people talk about characters, they usually intend one of the first two definitions.
    (source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
A particular character is identified by its name, not by its shape. A name may suggest a meaning, but the character may be used for representing other meanings as well. A name may suggest a shape, but that does not imply that only that shape is commonly used in print, nor that the particular shape is associated only with that name.
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).

[M] - XML

(N) See: Extensible Markup Language.
(source: , author: , text listed to be reviewed and possibly changed).
XML (which is an approximate abbreviation for Extensible Markup Language) is a popular method for structuring text. XML text is explicitly tagged with charsets. The specification for XML can be found at <http://www.w3.org/XML/>. <NONE> ASN.1 text formats The ASN.1 data description language has many formats for text data. The formats allow for different repertoires and different encodings. Some of the formats that appear in IETF standards based on ASN.1 include IA5String (all ASCII characters), PrintableString (most ASCII characters, but missing many punctuation characters), BMPString (characters from ISO/IEC 10646 plane 0 in UTF-16BE format), UTF8String (just as the name implies), and TeletexString (also called T61String).
(source: RFC 3536bis, author: John Klensin - Paul Hoffman, text listed to be reviewed and possibly changed).
A machine-readable file format for storing structured data. Used to represent web pages (in a subset called HTML) etc. Used by IANA for storing protocol parameter registries.
(source: IANA glossary, author: undisclosed, text listed to be reviewed and possibly changed).
eXtensible Markup Language. A subset of SGML constituting a particular text markup language for interchange of structured data. The Unicode Standard is the reference character set for XML content. (See also SGML and rich text.) XML is a trademark of the World Wide Web Consortium.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - xn--

See: ACE prefix.

[M] - Ypogegrammeni

Greek term for subscript iota, used in polytonic Greek character names.
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Y-variant

Two CJK unified ideographs with identical semantics and non-unifiable shapes, for example, U/732B and U/8C93. (See Z-variant.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Zenkaku

(See fullwidth.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - Zero Width

Characteristic of some spaces or format control characters that do not advance text along the horizontal baseline. (See nonspacing mark.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).

[M] - zonale

A file listing the parameters related to a TLD or to a domain name zone.
(source: ALFA, author: Jefsey Morfin, text listed to be reviewed and possibly changed).

[M] - Zone Variant

Zone Variant: A Preferred or Character Variant Label that is actually to be entered (registered) into the DNS. That is, into the zone file for the relevant zone. Zone Variants are also referred to as Zone Variant Labels or Active (or Activated) Labels.
(source: RFC 3743, author: Kazunori KONISHI, Kenny HUANG, QIAN Hualin, KO YangWoo, text listed to be reviewed and possibly changed).

[M] - Z-variant

Two CJK unified ideographs with identical semantics and unifiable shapes, for example, U/8AAA and U/8AAC. (See Y-variant.)
(source: WiWoW, author: undisclosed, text listed to be reviewed and possibly changed).
Personal tools