shithub: riscv

Download patch

ref: 8003c8b1e2d5d6e2a22ca7e552b53e631db86df4
parent: bba6d26ca26a60690d50b3fe41a8778abd66cff0
author: cinap_lenrek <[email protected]>
date: Thu Sep 24 08:14:08 EDT 2015

utf(6), rune(2): document 21-bit runes

--- a/sys/man/2/rune
+++ b/sys/man/2/rune
@@ -54,7 +54,7 @@
 and returns the number of bytes copied.
 .BR UTFmax ,
 defined as
-.B 3
+.B 4
 in
 .BR <libc.h> ,
 is the maximum number of bytes required to represent a rune.
--- a/sys/man/6/utf
+++ b/sys/man/6/utf
@@ -7,7 +7,7 @@
 .SM UTF-8
 encoding (Universal Character
 Set Transformation Format, 8 bits wide).
-The Unicode Standard represents its characters in 16
+The Unicode Standard represents its characters in 21
 bits;
 .SM UTF-8
 represents such
@@ -19,7 +19,7 @@
 .PP
 In Plan 9, a
 .I rune
-is a 16-bit quantity representing a Unicode character.
+is a 32-bit quantity representing a Unicode character.
 Internally, programs may store characters as runes.
 However, any external manifestation of textual information,
 in files or at the interface between programs, uses a
@@ -65,19 +65,21 @@
 sequence
 as follows:
 .PP
-01.   x in [00000000.0bbbbbbb] → 0bbbbbbb
+001.   x in [00000000.00000000.0bbbbbbb] → 0bbbbbbb
 .br
-10.   x in [00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb
+010.   x in [00000000.00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb
 .br
-11.   x in [bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb, 10bbbbbb
+011.   x in [00000000.bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb, 10bbbbbb
 .br
+100.   x in [000bbbbb.bbbbbbbb.bbbbbbbb] → 11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
+.br
 .PP
-Conversion 01 provides a one-byte sequence that spans the
+Conversion 001 provides a one-byte sequence that spans the
 .SM ASCII
 character set in a compatible way.
-Conversions 10 and 11 represent higher-valued characters
-as sequences of two or three bytes with the high bit set.
-Plan 9 does not support the 4, 5, and 6 byte sequences proposed by X-Open.
+Conversions 010, 011 and 100 represent higher-valued characters
+as sequences of two, three or four bytes with the high bit set.
+Plan 9 does not support the 5 and 6 byte sequences proposed by X-Open.
 When there are multiple ways to encode a value, for example rune 0,
 the shortest encoding is used.
 .PP