Chapter 7: Float

Floating-Point Representation

Floating-point numbers are represented using three parts: the sign, exponent, and mantissa.

Sign Exponent Mantissa
1 bit (0: +, 1: -) 8 bits 23 bits

Example

Representation of 1.01011101 × 2^5:

Sign Exponent Mantissa
0 10000100 01011101000000000000000

Note: ^ is not exponential; it is XOR.

Using the cmath Library

#include <cmath>

double x{ std::pow(3.0, 4.0) }; // 3 to the 4th power

Operator Precedence and Associativity

Refer to the Table of Operator Precedence and Associativity for detailed information.

Ensure Evaluation Order

Ensure that the expressions or function calls you write are not dependent on operand evaluation order.

printCalculation(getValue(), getValue(), getValue()); // this line is ambiguous

// Do this instead
int a{ getValue() }; // will execute first
int b{ getValue() }; // will execute second
int c{ getValue() }; // will execute third
printCalculation(a, b, c); // unambiguous

Modulo and Remainder

In mathematics:

-21 modulo 4 = 3
-21 remainder 4 = -1

Pre-increment and Post-increment

int x { 5 };
int y { ++x }; // x is incremented to 6, x is evaluated to 6, and 6 is assigned to y

int x { 5 };
int y { x++ }; // x is incremented to 6, copy of original x is evaluated to 5, and 5 is assigned to y

Avoid Undefined Behavior

int value{ add(x, ++x) }; // undefined behavior: is this 5 + 6, or 6 + 6?

Comma Operator

std::cout << (++x, ++y) << '\n'; // x evaluated first, y evaluated second, y is printed

Floating-Point Comparisons

Avoid using == and != with floating-point numbers.

std::cout << std::boolalpha << (0.3 == 0.2 + 0.1); // prints false

Precision

Use == and !=:

Example of Floating-Point Precision

#include <iomanip> // for std::setprecision()

std::cout << std::setprecision(17); // show 17 digits of precision
std::cout << 3.33333333333333333333333333333333333333f << '\n'; // float
std::cout << 3.33333333333333333333333333333333333333 << '\n'; // double

Special Cases in Floating-Point Arithmetic

double posinf { 5.0 / 0.0 }; // positive infinity
double nan { 0.0 / 0.0 };     // not a number (mathematically invalid)