Unicode: Difference between revisions

From Resonite Wiki
Created page with "[https://en.wikipedia.org/wiki/Unicode Unicode] is the international standard used to represent text on almost all modern computer systems. Resonite also uses Unicode, so various ProtoFlux nodes and concepts have functionality relevant for it."
 
Add some info about encodings, since we have stuff mentioning that.
Line 1: Line 1:
[https://en.wikipedia.org/wiki/Unicode Unicode] is the international standard used to represent text on almost all modern computer systems. Resonite also uses Unicode, so various [[ProtoFlux]] nodes and concepts have functionality relevant for it.
[https://en.wikipedia.org/wiki/Unicode Unicode] is the international standard used to represent text on almost all modern computer systems. [[Resonite]] also uses Unicode, so various [[ProtoFlux]] nodes and concepts have functionality relevant for it.
 
= Encodings =
 
Unicode defines various ''encodings'' to define how text is represented as bytes, in memory, over the network and in files. Different systems and programming languages use different encodings by default, but they all represent the same data.
 
== UTF-8 ==
 
[https://en.wikipedia.org/wiki/UTF-8 UTF-8] is the encoding most commonly seen in text files and over the network.<ref>At least in the west.</ref> It is variable length, with a single code point being represented as 1-4 bytes of data.
 
== UTF-16 ==
 
[https://en.wikipedia.org/wiki/UTF-16 UTF-16] is used by Resonite itself to represent [[Type:String|String]]s and [[Type:char|char]]s. Most code points are represented as 16-bit values (a [[Type:char|char]]), however like UTF-8 it is variable length, code points outside the "Basic Multilingual Plane" (such as emojis) requiring 2 [[Type:char|char]]s in a "surrogate pair".
 
 
== UTF-32 ==
 
[https://en.wikipedia.org/wiki/UTF-32 UTF-32] represents code points as 32-bit values. This is not a variable length encoding like UTF-8 and UTF-16, but it does waste a large amount of space for most text.

Revision as of 17:44, 17 April 2024

Unicode is the international standard used to represent text on almost all modern computer systems. Resonite also uses Unicode, so various ProtoFlux nodes and concepts have functionality relevant for it.

Encodings

Unicode defines various encodings to define how text is represented as bytes, in memory, over the network and in files. Different systems and programming languages use different encodings by default, but they all represent the same data.

UTF-8

UTF-8 is the encoding most commonly seen in text files and over the network.[1] It is variable length, with a single code point being represented as 1-4 bytes of data.

UTF-16

UTF-16 is used by Resonite itself to represent Strings and chars. Most code points are represented as 16-bit values (a char), however like UTF-8 it is variable length, code points outside the "Basic Multilingual Plane" (such as emojis) requiring 2 chars in a "surrogate pair".


UTF-32

UTF-32 represents code points as 32-bit values. This is not a variable length encoding like UTF-8 and UTF-16, but it does waste a large amount of space for most text.

  1. At least in the west.