Unicode

Unicode is a programming standard for encoding written characters and text in operating systems around the world[1]. Since math is commonly entered into computers, both for calculating results and typesetting, this website uses the unicode standard in the symbol resources it provides. See the pages listed below for navigating math unicode symbols on this site.

Code Points

Each character in the unicode standard is described by a code point which is a hexadecimal value that uniquely identifies the character. This website lists the unicode point for each math symbol, where U+ prefix is used to indicate that the hexadecimal value is a unicode code point. For example, the (pi) symbol has the code point shown below.

U+03C0

Code points are encoded based on the operating system’s encoding format, either UTF-8, UTF-16 or the UTF-32 format. This is why the standard is described as a variable-width format. For example, the symbol is encoded using two one-byte (8 bit) code units in the UTF-8 format and one two-byte (16 bit) code units in the UTF-16 format. The encodings of the symbol (U+03C0) are shown below.

Format Encoding
UTF-8 0xCF 0x80
UTF-16 0x03C0

This website uses the UTF-8 format. All characters in the unicode standard can be described using one to four one-byte (8 bit) code units using the UTF-8 format. Note, the symbol resources do not provide the unicode encodings for math symbols, just code points.

Combining Characters

Math sometimes uses the combining characters of the unicode standard to combine two glyphs into a new character. For example, the (x bar) symbol is a combination of the latin small letter x (U+0078) and the combining macron character (U+0304). This is illustrated in the diagram below.

Visual of combining U+0078 (latin small letter x) and U+0304 (combining macron) to form x-bar symbol

Note, the combining character follows the character it is being combined with. The modern alternate to combining characters is to use a math typesetting system, where accent commands allow for the same effect to be acheived. For example, x bar symbol would be described as the plain text \bar{x} instead of with a combining character.

Below are the combining characters referenced on this site.

Javascript Example

Programming languages have become more friendly when working with the unicode standard over time. This example demonstrates how to encode and decode some math unicode symbols. To encode a combining character using javascript the following syntax can be used.

console.log("\u{0078}\u{0304}") // prints x̄ (x bar)

This is equivalent to the following expression.

console.log("x\u{0304}") // prints x̄ (x bar)

Note, from the appearence of this result you might think the length of this string is 1, but it is in fact 2 as shown below.

console.log("x̄".length) // prints 2

Finally, here is a function that prints the code points that make up a string.

function analyze(s) {
  console.log("input:" + s)
  console.log("length:" + s.length)
  for (let i = 0; i < s.length; i++) {
    console.log("U+" + s.charCodeAt(i).toString(16).padStart(4, "0"))
  }
}

Given the following input:

analyze("x̄")

The function produces the output:

input:x̄
length:2
U+0078
U+0304

References

  1. Unicode Wikipedia