New in version 0.12.
IEEE-754 is a standard for the representation of and computations with floating point numbers in binary systems. It is widely used by floating point implementations in CPUs. These functions implement encoding and decoding binary representations of floating point numbers according to IEEE-754.
An IEEE-754 binary float consists of three parts: a sign bit, the exponent and the significand
(sometimes called the mantissa). From these parts, the value is then calculated using the
following formula: -1 ^ sign * 2 ^ (exponent - bias) * 1.significand
. The standard defines
multiple binary formats of different sizes that all follow these rules, but differ in
the number of bits allocated for the exponent and significand. The bias for the default
formats is defined as bias = (2 ^ (exponent_bits - 1)) - 1
.
See this article for a more detailed introduction into the subject.
The following binary float formats are defined by the standard:
Name | Also known as | Exponent bits | Significand bits |
---|---|---|---|
binary16 |
Half precision | 5 | 10 |
binary32 |
Single precision | 8 | 23 |
binary64 |
Double precision | 11 | 52 |
binary128 |
Quad precision | 15 | 112 |
In many programming languages, the binary32
format is available as float
and binary64
is available as double
.
ieee754_encode
(x; exponent_bits; significand_bits[; exponent_bias])¶Encode a floating point number into a IEEE-754 binary representation.
Parameters: |
|
---|
ieee754_decode
(x; exponent_bits; significand_bits[; exponent_bias])¶Calculate the value of an IEEE-754 binary float.
Parameters: |
|
---|
ieee754_half_encode
(x)¶Encode x
in the half-precision binary format.
ieee754_half_decode
(x)¶Decode the half-precision binary float x
.
ieee754_single_encode
(x)¶Encode x
in the single-precision binary format.
ieee754_single_decode
(x)¶Decode the single-precision binary float x
.
ieee754_double_encode
(x)¶Encode x
in the double-precision binary format.
ieee754_double_decode
(x)¶Decode the double-precision binary float x
.
ieee754_quad_encode
(x)¶Encode x
in the quad-precision binary format.
ieee754_quad_decode
(x)¶Decode the quad-precision binary float x
.