# Understanding and Preventing Overflow

for k in [10, 100, 1000, 1020, 1023, 1023.9, 1023.9999, 1024]:
try:
print "2^%.4f = %g" % (k, 2.0 ** k)
except OverflowError, e:
print "2^%.4f ---> %s" % (k,e)
2^10.0000 = 1024
2^100.0000 = 1.26765e+30
2^1000.0000 = 1.07151e+301
2^1020.0000 = 1.12356e+307
2^1023.0000 = 8.98847e+307
2^1023.9000 = 1.67731e+308
2^1023.9999 = 1.79757e+308
2^1024.0000 ---> (34, 'Result too large')

2^1024 是一个非常非常大的数字。你可能永远不会需要像2^1024这样的数字，除非你在用组合学。为了方便比较，这里有一些物理量的数值区间：

## 溢出源

### 常见的疑点是：加法和乘法

import numpy as np

a = np.int16(32767)
b = np.int16(2)
print "32767 + 2 = %s" % (a+b)
print "32767 * 2 = %s" % (a*b)
32767 + 2 = -32767
32767 * 2 = -2
-c:3: RuntimeWarning: overflow encountered in short_scalars
-c:4: RuntimeWarning: overflow encountered in short_scalars

If you're programming in C with a compiler that meets the C99 standard, you may be surprised to learn that unsigned integer math is guaranteed to have modulo behavior under overflow conditions, but the behavior of overflow with signed integer math is undefined.

To would-be language lawyers, the relevant sections in C99 are:

Section 6.5 item 5:

If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.

Section 6.2.5 item 9:

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

(In practice, modern compilers such as gcc and clang tend to use modulo behavior anyway for basic arithmetic calculations, but be careful, as sometimes compiler optimizations will treat comparisons or conditional expressions incorrectly even though the basic arithmetic is "correct". Symptoms of this behavior can be really tricky to recognize; the LLVM website has some explanation of this, along with other bugbears of undefined behavior. If you want to be safe, use the -fwrapv compiler flag, which both gcc and clang support; this guarantees modulo behavior with all aspects of signed arithmetic.)

The historical reason for this little quirk in the C standard, is that in the olden days some processors used ones-complement representation for signed numbers, in which case arithmetic for signed and unsigned numbers is slightly different. Nowadays twos-complement is the norm, and the implementations of addition, subtraction, and multiplication are the same for signed and unsigned when using base word lengths for the results. But one of the main principles of C, for better or for worse, is to allow machine-specific behavior to maintain efficient code, so the standard was written with no guarantees for the results of signed arithmetic overflow. Languages like Java, on the other hand, have machine-independent definitions of arithmetic. Any arithmetic operation in Java will give you the same answer no matter which processor you are running it on. The price of this guarantee is that on some processors, extra instructions will be necessary.

By the way, if you're using C for integer math, be sure to #include <stdint.h> and use the typedefs it includes, such as int16_t, uint16_t, int32_t, and uint32_t. These are portable, whereas the number of bits in a short or an int or a long may vary with processor architecture.

If you are using the fixed-point features of MATLAB, beware that the default behavior of integer overflow is to saturate at the limits of the integer range. This avoids some of the problems of overflow, but it may not give you the results you expect, if you're used to languages that provide wraparound semantics.

### PREVENTING OVERFLOW

Just because we can use -fwrapv in gcc or clang, and guarantee wraparound behavior, doesn't make it the correct behavior for an application program.

If I am controlling a valve, and I want the output to increase, and I keep adding 1 to a process control variable, so that I get 32765, 32766, 32767, -32768, -32767, etc. this creates a jump discontinuity which is BAD. The only machine-independent method of preventing this is to stay out of overflow. One approach is to upcast to a larger type size, check for overflow, and saturate the results:

#include <stdint.h>

{
int32_t z = (int32_t)x + y;
if (z > INT16_MAX)
{
z = INT16_MAX;
}
else if (z < INT16_MIN)
{
z = INT16_MIN;
}
return (int16_t)z;
}

You could also do something like this to stay within the range a 16-bit type, but it gets a little tricky, uses more operations, and I'm not 100% confident I've got my code correct:

#include <stdint.h>

{
int16_t z = x+y;
if ((y > 0) && (x > (INT16_MAX - y)))
{
z = INT16_MAX;
}
else if ((y < 0) && (x < (INT16_MIN - y)))
{
z = INT16_MIN;
}
return z;
}

Handling multiplication is similar, and is really only practical using the type-widening approach.

### 减法?

int16_t calcPI(int16_t command, int16_t measurement)
{
int16_t error = command - measurement;
/* other stuff */
}

int32_t error = (int32_t)command - measurement;

int16_t error = ((int32_t)command - measurement) / 2；

### SHIFTS?

Don't forget your shift operators << and >>. These won't cause overflow, but in C, don't forget the pesky undefined and implementation-specific behaviors. Here the relevant section from the C99 standard is section 6.5.7 paragraphs 3-5:

3 The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2^E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.

That's right! If you have a signed integer E1 containing a negative number, and you shift right, the results are implementation defined. The "sensible" thing to do is an arithmetic shift right, putting 1 bits into the most significant bits to handle sign-extension. But the C standard allows the compiler to produce a logical shift right, putting 0 bits into the most significant bits, and that's probably not what you wanted.

More recent languages like Java and Javascript give you two shift right operators: the regular >> operator for computing arithmetic shift right, and the >>> operator for computing logical shift right to produce an unsigned binary integer with zeros shifted into the most significant bits.

### IS THAT IT? WHEW!

No, that's not it! What else is there? Well, there are the increment and decrement operators ++ and --, but with -fwrapv they should work with wraparound semantics as expected.

The rest of the overflow items fall into what might be called Pathological Cases. These all involve asymmetry between INT_MIN and INT_MAX that causes unwanted aliasing.

We talked about divide earlier, and the one integer divide overflow is when you take -32768 and divide by -1.

# Let's divide -32768 / -1
a=np.int16(-32768)
b=np.int16(-1)
a/b
-32768

D'OH! We should have gotten +32768 but that just won't fit into a 16-bit signed integer, so it aliases to the int16_t bit-equivalent value -32768.

The same thing happens with unary minus:

# Let's negate -32768
-a
-32768

Then there are the fixed-point multiplication flaws along the same line. Let's say you're doing Q15 math, where integers represent numbers of the form $\frac{k}{2^{15}}$. C code looks like this:

int16_t a = ...;
int16_t b = ...;
int16_t c = ((int32_t)a * b) >> 15;

Does this work properly without overflow? Well, it does, except for one single case. $-32768 \times -32768 = 2^{30}$, and if we shift that right by 15 we get $2^{15} = 32768$. But in a signed 16-bit integer, 32768 aliases to -32768. Ouch!

### HOW DO WE DEAL WITH THIS?

#### Q15 multiplication

Well, the Q15 multiplication is a tough one. If you are absolutely certain you won't ever evaluate -32768 × -32768, go ahead and use the usual C code as is. One example is PI control loops where the gains are always nonnegative. Or if you know the range of one of the numbers doesn't include -32768 then you're fine.

Alternatively, if you're willing to have an extra shift right, you can do this:

int16_t a = ...;
int16_t b = ...;
int16_t c = ((int32_t)a * b) >> 16;

The shift-right-by-16 operation is usually faster in embedded systems than a shift by 15 by one instruction cycle, since it often maps to a "grab the high word" operation in assembly, which you usually get for free when storing memory. This operation represents $a \times b \times \frac{1}{2}$ if a and b are Q15 fixed-point integers, or $a \times b$
if one of the values is Q15 and the other is Q16.

Other than that, you need some way of saturating the intermediate product so it is safe to shift right:

int16_t a = ...;
int16_t b = ...;
int16_t c = sat30((int32_t)a * b) >> 15;

where sat30(x) limits the results so that they are within the range $-2^{30} + k_1$ to $2^{30} - 1 - k_2$, where k1 and k2 are any numbers between 0 and 32767. (Why this ambiguity? Because the least significant 15 bits don't matter, and certain processors may have quirks that allow you to execute code faster by choosing different values of k. For example, the literal value of $2^{30} - 1$ may not be possible to load in one instruction, whereas the literal value of $2^{30} - 32768$= 32767 << 15 may be possible to load in one instruction.)

#### Unary minus

The unary minus case is somewhat similar. We have a few alternatives:

If we are absolutely sure that the input cannot be -32768, we are fine leaving it as is.

Otherwise, we have to test for -32768 and convert to +32767.

Alternatively, in C we can use the bitwise complement operator ~, because ~x = -x-1 for signed integers: ~-32768 = 32767, ~0 = -1, ~-1 = 0, ~32767 = -32768. This adjustment by 1 for all inputs may not be acceptable in some applications, but in others it is fine, representing a small offset, and the complement operation is often a fast single-instruction operation.

The same idea applies to absolute value; instead of

define ABS(x) ((x) < 0 ? (-x) : (x))

we can use the bitwise complement operator:

define ABS_Q15(x) ((x) < 0 ? (~x) : (x))

Alternatively, we'd have to special-case the test for -32768.

Just a reminder: #define macros are invisible to the compiler (because they happen in the preprocessor) and are also unsafe to use with arguments that have side effects e.g. ABS(++x) which would get incremented three times.

暂无评论~~
• 请注意单词拼写，以及中英文排版，参考此页
• 支持 Markdown 格式, **粗体**、~~删除线~~、单行代码, 更多语法请见这里 Markdown 语法
• 支持表情，使用方法请见 Emoji 自动补全来咯，可用的 Emoji 请见 :metal: :point_right: Emoji 列表 :star: :sparkles:
• 上传图片, 支持拖拽和剪切板黏贴上传, 格式限制 - jpg, png, gif
• 发布框支持本地存储功能，会在内容变更时保存，「提交」按钮点击时清空
请勿发布不友善或者负能量的内容。与人为善，比聪明更重要！
Ctrl+Enter