How to Compute the Angle between Two Vectors in A Stable Way

Your trigonometric teahcher must have told you given two vectors , their angle can be readily obtained by taking their dot product devided by their product of norms:

Well, it's mathematically correct but there is some difference for computers: numerical stability. For a very small angle, in which case the two vectors are almost parallel, the result of can be very close to 1, and the derivative of at , approaches infinity. This means that even a small precision loss in can result in huge precision loss after is applied.

Suppose the mathematically authentic value of is , and the floating-number represented in computer is where is small enough. Then, the relative error can be written as:

When approaches , the relative error grows to infinity even if is fixed. With that being said, the precision loss embedded in the input number is magnified by the function that transforms the number. We call this numerical instability extrinsic instability.

Catastrophic Cancellation

According to Wikipedia, catastrophic cancellation is when subtracting approximations to two close floating values may yield a bad approximation to the difference of the original numbers.

Image the two real numbers are , and their rounded floating numbers are and , where and can be relatively small. Their relative error, however, can be arbitrarily large:

The relative error is fixed if , but this is not generally possible in practice. If , and , and assume , we can estimate the relative error as follows:

This shows that if and are close enough, the relative error can be large, much larger than . But note that the catastrophic cancellation issue is only related to relative error, not absolute error. This means that even if the relative error is very large, the absolute error can be very small relative to the operand's own magnitude. We care about the relative error because it's highly related to the precision of the significand. For example, assume the actual number is and the floating-point representation is . The latter loses half precision of the significand, but the absolute error, scaled down by a factor of , is very small.

So what about other arithmetics like addition, multiplication and division? Let's examine them.

For addition:

We can see that addition is bounded as long as is not close to . Next we test the relative error of multiplication.

which is tightly bounded.

You can also check division is also bounded by the relative error introduced from each floating number operand.

As an example for subtraction, consider calculating the following value when is small enough ():

By calculus knowledge we know that , but due to precision loss of when , the value of will be 0, and zero divided by a non-zero value is zero itself.

Matt Pharr's post gives another great example illustrating this problem. Due to the 32-bit floating point representation, the multiplication of two large floating number will be rounded up at the base of . This tells us that the floating number naturally comes with precision loss, but when we do the calculation properly, the precision loss can be significantly mitigated. The general idea is try to maintain precision for each expression and not leave subtraction to the final step.

We call the numerical instability brought by floating number representation as intrinsic instability. This post discusses extrinsic instability for most of the time. For more details about intrinsic instability, or catastrophic cancellation, I recommend reading Bruce Dawson's post about floating numbers.

Four Ways to Compute Angles Between Vectors

There are four popular ways to compute the angle between two vectors including the unstable trivial one introduced at the beginning.

Equation is simply an equivalent representation of equation as , where in this case.

To prove the correctness of equation , just note that and .

To prove the correctness of equation (and thus equation ), we can assume and are both normalized without loss of generality. Then and . Their fraction is:

Proof completes.

Mathematical Analysis

According to which type of instability is chosen, we can analyze and compare the relative error of each method. In the following sections, we use to denote the floating point representation of the exact input value.

Extrinsic Relative Error

In extrainsic relative error analysis, we assume the overall input has a fixed relative error: where .

For the first method, the extrinsic relative error is:

When , the second term would approach infinity. The extrinsic relative error of method one is unbounded.

Now let's examine the second method. It's extrinsic relative error is:

Note that when , so the second term is bounded and thus the extrinsic relative error of the second/third/forth method is bounded.

Intrinsic Relative Error

Evaluating the instrinsic relative error is much more difficult. We use to represent the floating-point representation of any exact value , with its relative error . The following notations are used in this section:

Note that:

Each ratio is bounded within , so we can rewrite it as where . The ratio of product to product can be expressed as:

For , we notice that:

The intrinsic relative error of the first method is:

It's obvious that can approach infinity when . Therefore, the instrinsic relative error of the first method is unbounded when .

Let's go on to the second method:

The next step is to evaluate the ratio of two tangents, but its not as easy as expected. We then turn to evaluate their cosines and sines separately.

When , the ratio is unbounded. When , the ratio can also be potentially super large. But it's worth noting that as is generally very small and is very close to 1, a significantly small value of is required to ensure the ratio unbounded. Due to the limited precision, the ratio is more likely to be bounded at a slightly large value.

Next let's evaluate the third method. For brevity, we use the following notations:

Then we can calculate the relative error as follows:

The relative error can approach infinity when , but is well bounded when . Just as explained in the second method, the relative error will be more likely to be bounded at a slightly large value when .

In fact, the forth method has the exact same order of magnitude of relative error as the third method. Just notice that:

where is a normalized vector. Substituting into the third method and we will get the same expression. Therefore, the relative error can approach infinity when .

Overall, the extrinsic and intrinsic relative error of each method can be summarized in the next table, where means stable (bounded), means unstable (unbounded) and means theoretically unbounded but practically bounded. The left symbol of indicates the stability when the angle approaches , and the right symbol indicates when the angle approaches . For example, means the method is unstable when angle approaches but stable when angle is close to .

Method 1 Method 2 Method 3 Method 4
Extrinsic
Intrinsic

Experiments

First we evaluate the instrinsic relative error of each method (code is available at here). Results for are shown in the next figure.

The first method is well bounded whereas the other three methods have a relatively large bounding value. This aligns with our conjecture that within the limited precision, the intrinsic relative error of the latter three methods comes with decent stability. As the angle gets much smaller, the relative error should be infinite.

The following figure shows results for .

As expected, the first and second method has unbouned value and the third and forth method is well bounded.

The following two figures respectively show the results for the overall relative errors for and .

Only the first method has an unbounded relative error when due to its infinite extrinsic relative error as .

Conclusion

Never use to compute the angle between vectors! Use to get a numerically stable result. Compared to intrinsic relative errors, the extrinsic relative error seems to make the greatest difference on the stability and even if the intrinsic relative error is unbounded, you can still practically obtain a numerically stable result with . If you do care about the intrinsic relative error, the third or forth method is all you need.

References

Floating-point numbers and catastrophic cancellation
https://en.wikipedia.org/wiki/Catastrophic_cancellation
https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
https://pharr.org/matt/blog/2019/11/03/difference-of-floats

Angle between two vectors
https://scicomp.stackexchange.com/questions/27689/numerically-stable-way-of-computing-angles-between-vectors
https://math.stackexchange.com/questions/1143354/numerically-stable-method-for-angle-between-3d-vectors/1782769#1782769

Appendix

I find it very interesting to ask this question: how many digits in the decimal form are required to fully express a 32-bit floating number smaller than ?

For example, the floating number uses two decimal digits to express the floating number since it can be written as . The floating number uses three decimal digits because it's . More generally, any floating number can be represented in either the binary way or decimal way:

The binary representation is converted from the floating number's internal bit-wise representation assuming precision under the 32-bit IEEE standard. When the exponential field is non-zero, an implied leading one is added, just as shown in the above binary representation. When the exponential field is zero, there is no leading one, so the binary representation becomes:

Okay, let's go back to our question. To determine the digits for any floating number, we can first resolve a simpler case for floating numbers in the form of . It's very obvious that, the total number of digits of the plain decimal representation for before normalization, is exactly . For example, has two digits, has six digits, both including the leading zeros. Then, if we use this number to subtract the number of zeros, we will get the answer we need.

How many leading zeros are there in the decimal form of ? The question is equivalent to finding the smallest integer such that . We can transform this inequality:

Hence, the number of decial digits needed to precisely represent is . We can then fully express in the decimal form using this result:

We can test this result with some examples. When , the result is , correct. When , the result is , correct. When , i.e., , the result is , correct.

With this formula, we know that the maximum number of digits to represent a -like floating number in the decimal form is when , and this number is . When , the binary representation of that floating number has zero in the exponential field and one in the significant field, that is, . Everything is perfect.

What about a general floating number? You may think: oh it's easy, just take the maximum count of all that contribute to the floating number and that's the answer! Unfortunately, that's not the case. Take as an example. The decimal representation is , for which we must use up to digits to represent it, rather than - the digit count for , nor - the digit count for .

This happens because the number of leading zeros for each is different. When a adds to , the leading zeros can be filled, thus increasing the valid digits.

We can first decompose into a series of addition:

Then it's equivalent to a series of addition of:

Because is a non-decreasing function, we can rearrange these floating numbers as:

The number of decimal numbers to represent the sum of these floating numbers depends only on because is also non-decreasing and adds the most leading zeros. When adds to , the first digit, which is of makes sures all digits after is valid, that is, needed to represent the floating number. The final result is thus:

But not all s are , so in practice we first need to find the largest index such that , and we can compute the result by .

What is maximum value of among all floating numbers? Note that and are independent, so we can take the largest and the largest and the result is . This floating number has a zero exponential field and all-one significant field, i.e., the largest floating value smaller than .