Latest revision as of 17:44, 17 April 2024

Unicode is the international standard used to represent text on almost all modern computer systems. Resonite also uses Unicode, so various ProtoFlux nodes and concepts have functionality relevant for it.

Encodings

Unicode defines various encodings to define how text is represented as bytes, in memory, over the network and in files. Different systems and programming languages use different encodings by default, but they all represent the same data.

UTF-8

UTF-8 is the encoding most commonly seen in text files and over the network.^[1] It is variable length, with a single code point being represented as 1-4 bytes of data.

UTF-16

UTF-16 is used by Resonite itself to represent Strings and chars. Most code points are represented as 16-bit values (a char), however like UTF-8 it is variable length, code points outside the "Basic Multilingual Plane" (such as emojis) requiring 2 chars in a "surrogate pair".

UTF-32

UTF-32 represents code points as 32-bit values. This is not a variable length encoding like UTF-8 and UTF-16, but it does waste a large amount of space for most text.

↑ At least in the west.

[1] At least in the west.

[1]

@@ Line 12: / Line 12: @@
 [https://en.wikipedia.org/wiki/UTF-16 UTF-16] is used by Resonite itself to represent [[Type:String|String]]s and [[Type:char|char]]s. Most code points are represented as 16-bit values (a [[Type:char|char]]), however like UTF-8 it is variable length, code points outside the "Basic Multilingual Plane" (such as emojis) requiring 2 [[Type:char|char]]s in a "surrogate pair".
 == UTF-32 ==
 [https://en.wikipedia.org/wiki/UTF-32 UTF-32] represents code points as 32-bit values. This is not a variable length encoding like UTF-8 and UTF-16, but it does waste a large amount of space for most text.