Chapter 7: Float

Floating-Point Representation

Floating-point numbers are represented using three parts: the sign, exponent, and mantissa.

Sign	Exponent	Mantissa
1 bit (0: +, 1: -)	8 bits	23 bits

Example

Representation of 1.01011101 × 2^5:

Sign	Exponent	Mantissa
0	10000100	01011101000000000000000

The mantissa's leading 1 is dropped because it is always 1.
The exponent value is the exponent field minus 127 (e.g., 10000100 = 132 = 127 + 5).

Note: ^ is not exponential; it is XOR.

Using the `cmath` Library

#include <cmath>

double x{ std::pow(3.0, 4.0) }; // 3 to the 4th power

Operator Precedence and Associativity

Refer to the Table of Operator Precedence and Associativity for detailed information.

Ensure Evaluation Order

Ensure that the expressions or function calls you write are not dependent on operand evaluation order.

printCalculation(getValue(), getValue(), getValue()); // this line is ambiguous

// Do this instead
int a{ getValue() }; // will execute first
int b{ getValue() }; // will execute second
int c{ getValue() }; // will execute third
printCalculation(a, b, c); // unambiguous

Modulo and Remainder

In mathematics:

-21 modulo 4 = 3
-21 remainder 4 = -1

Pre-increment and Post-increment

int x { 5 };
int y { ++x }; // x is incremented to 6, x is evaluated to 6, and 6 is assigned to y

int x { 5 };
int y { x++ }; // x is incremented to 6, copy of original x is evaluated to 5, and 5 is assigned to y

Avoid Undefined Behavior

int value{ add(x, ++x) }; // undefined behavior: is this 5 + 6, or 6 + 6?

Comma Operator

std::cout << (++x, ++y) << '\n'; // x evaluated first, y evaluated second, y is printed

Floating-Point Comparisons

Avoid using == and != with floating-point numbers.

std::cout << std::boolalpha << (0.3 == 0.2 + 0.1); // prints false

Precision

Float: minimum precision of 6 digits
Double: minimum precision of 15 significant digits

Use == and !=:

With the same floating type
Not exceeding the minimum digits

Example of Floating-Point Precision

#include <iomanip> // for std::setprecision()

std::cout << std::setprecision(17); // show 17 digits of precision
std::cout << 3.33333333333333333333333333333333333333f << '\n'; // float
std::cout << 3.33333333333333333333333333333333333333 << '\n'; // double

Special Cases in Floating-Point Arithmetic

double posinf { 5.0 / 0.0 }; // positive infinity
double nan { 0.0 / 0.0 };     // not a number (mathematically invalid)