Unicode: Difference between revisions

From Resonite Wiki
Add some info about encodings, since we have stuff mentioning that.
m I love how wikitext makes it really easy to have accidental empty paragraphs.
 
Line 12: Line 12:


[https://en.wikipedia.org/wiki/UTF-16 UTF-16] is used by Resonite itself to represent [[Type:String|String]]s and [[Type:char|char]]s. Most code points are represented as 16-bit values (a [[Type:char|char]]), however like UTF-8 it is variable length, code points outside the "Basic Multilingual Plane" (such as emojis) requiring 2 [[Type:char|char]]s in a "surrogate pair".
[https://en.wikipedia.org/wiki/UTF-16 UTF-16] is used by Resonite itself to represent [[Type:String|String]]s and [[Type:char|char]]s. Most code points are represented as 16-bit values (a [[Type:char|char]]), however like UTF-8 it is variable length, code points outside the "Basic Multilingual Plane" (such as emojis) requiring 2 [[Type:char|char]]s in a "surrogate pair".


== UTF-32 ==
== UTF-32 ==


[https://en.wikipedia.org/wiki/UTF-32 UTF-32] represents code points as 32-bit values. This is not a variable length encoding like UTF-8 and UTF-16, but it does waste a large amount of space for most text.
[https://en.wikipedia.org/wiki/UTF-32 UTF-32] represents code points as 32-bit values. This is not a variable length encoding like UTF-8 and UTF-16, but it does waste a large amount of space for most text.

Latest revision as of 17:44, 17 April 2024

Unicode is the international standard used to represent text on almost all modern computer systems. Resonite also uses Unicode, so various ProtoFlux nodes and concepts have functionality relevant for it.

Encodings

Unicode defines various encodings to define how text is represented as bytes, in memory, over the network and in files. Different systems and programming languages use different encodings by default, but they all represent the same data.

UTF-8

UTF-8 is the encoding most commonly seen in text files and over the network.[1] It is variable length, with a single code point being represented as 1-4 bytes of data.

UTF-16

UTF-16 is used by Resonite itself to represent Strings and chars. Most code points are represented as 16-bit values (a char), however like UTF-8 it is variable length, code points outside the "Basic Multilingual Plane" (such as emojis) requiring 2 chars in a "surrogate pair".

UTF-32

UTF-32 represents code points as 32-bit values. This is not a variable length encoding like UTF-8 and UTF-16, but it does waste a large amount of space for most text.

  1. At least in the west.