Strange behaviour when comparing with zero

Hi everyone,

I'm currently trying to learn about floating point representation in depth, so I played around a bit. While doing so, I stumbled on some strange behaviour; I can't really work out what's happening, and I'd be very grateful for some insight. Apologies if this has been answered, I found it quite hard to google!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <iostream>
#include <cmath>
using namespace std;

int main(){

  float minVal = pow(2,-149); // set to smallest float possible
  
  float nextCheck = ((float)((minVal/2.0f))); // divide by two
  bool isZero = (static_cast<float>(minVal/2.0f) == 0.0f); // this thing evaluates to false when it really shouldn't...!?
  bool isZero2 = (nextCheck == 0.0f); // this evaluates to true

  cout << nextCheck << " " << isZero << " " << isZero2 << endl;
  // this outputs 0 0 1
  
  return 0;

}


Essentially what's happening is:
- I set minVal to be the smallest float that can be represented using single precision
- Dividing by 2 should yield 0 -- we're at the minimum
- Indeed, isZero2 does return true, but isZero returns false.

What's going on -- I would have thought them to be identical? Is the compiler trying to be clever, saying that dividing any number cannot possibly yield zero?

Thanks for your help!
It outputs 0 1 1 for me both in VS and on IDEone (Which uses gcc, iirc):

http://ideone.com/ksyIE8

What compiler are you using?
I would have thought them to be identical?
minVal/2.0f == minVal/2.0f is not guaranteed to return true even if minVal is perfectly normal floating point value (not NaN, not infinity, etc...)

Is the compiler trying to be clever, saying that dividing any number cannot possibly yield zero?
Nope, ypu might run into precision problem. result of expression containing floating point values is stored in temporary with unspecified precision.

In nextCheck assigment it might use smaller precision and minVal/2.0f will result in geniune 0.
In first comparsion it might store result in extended precision where minVal/2.0f is representable as non-zero and when casting to float rounding mode made result non-zero too.

You should not try to compare values with == or != until you will get the zen of floating point values.

mandatory reading: http://randomascii.wordpress.com/2013/07/16/floating-point-determinism/
> I set minVal to be the smallest float that can be represented using single precision

We are dealing with denormal (subnormal) floting point values here.
http://en.wikipedia.org/wiki/Denormal_number

With IEEE floating point,
1
2
float minVal = pow(2,-149); // equal to or close to the smallest denormal float possible
              // it has a magnitude smaller that the smallest normal floating point number 


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include <iostream>
#include <cmath>
#include <limits>
#include <iomanip>

int main(){

  const float pow_2_149 = std::pow(2,-149); // denormal number see: http://en.wikipedia.org/wiki/Denormal_number
  const float min = std::numeric_limits<float>::min() ; // smallest normal value (FLT_MIN) 
  const float lowest = std::numeric_limits<float>::lowest() ; // lowest normal value (-FLT_MAX)
  const float denorm_min = std::numeric_limits<float>::denorm_min() ; // smallest denormal value
  
  
  const float pow_2_149_by_2 = pow_2_149 / 2 ;  
  const float min_by_2 = min / 2 ; // smaller than smallest normal value, therefore denormal
  const float lowest_by_2 = lowest / 2 ; // lower than lowest normal value, therefore denormal
  const float denorm_min_by_2 = denorm_min / 2 ; // smaller than smallest denormal value, therefore zero
  
  std::cout << std::scientific << std::setprecision( std::numeric_limits<float>::digits10 + 1 ) << std::boolalpha
            << pow_2_149 << ' '  << min << ' ' << lowest << ' ' << denorm_min << '\n' 
            << pow_2_149_by_2 << ' ' << min_by_2 << ' ' << lowest_by_2 << ' ' << denorm_min_by_2 << '\n' 
            << ( pow_2_149_by_2 == 0 ) << ' ' <<  ( min_by_2 == 0 ) << ' ' << ( lowest_by_2 == 0 ) << ' ' << ( denorm_min_by_2 == 0 ) << '\n' ;
}

g++ -std=c++11 -O2 -Wall -Wextra -pedantic-errors main.cpp && ./a.out
clang++ -std=c++11 -stdlib=libc++ -O2 -Wall -Wextra -pedantic-errors main.cpp -lsupc++ && ./a.out

1.4012985e-45 1.1754944e-38 -3.4028235e+38 1.4012985e-45
0.0000000e+00 5.8774718e-39 -1.7014117e+38 0.0000000e+00
true false false true

1.4012985e-45 1.1754944e-38 -3.4028235e+38 1.4012985e-45
0.0000000e+00 5.8774718e-39 -1.7014117e+38 0.0000000e+00
true false false true

http://coliru.stacked-crooked.com/a/c5c0524f0651d873
http://rextester.com/DZCY5419
Last edited on
Hi,

In addition to the expert advice already given, it might help to realise how floating point numbers are stored (very basically).

They are like a number in scientific format, say -1.23e-4 say, except in binary (very basically speaking). So there is 4 parts: the sign of the mantissa; the mantissa (1.23) ; the exponent sign; and the exponent.

The mantissa is stored as a binary fraction less than 1 with the first digit being 1 (the exponent is adjusted to suit). So the mantissa of 7.5 is stored as 11 because 0.5 + 0.25 is 0.75 and 1/2 + 1/4 is 0.11 in binary.

So this has a bunch of consequences, one of which is not every real number can be represented exactly:

1
2
3
4
5
6
7
8
9
10
float a = 0.1; // a = 0.0999997 maybe

if (a == 0.1){} // almost certainly false

if (10.0 * a == 1.0){}  // almost certainly false

const float PRECISION 1e-6;
const float MyNum = 10 * a;

if (std::abs (1.0 - MyNum) <  PRECISION ){} // should be true MyNum == 1.0 to 6dp 


Changing the type to double doesn't help - now we 15 or 16 significant figures instead of 6 or 7.

The precision of floats is a problem: if one has a number larger than 1 or 10 million depending on whether your system has 6 or 7 significant figures for floats; then decimal fractions are no longer represented. That is, we can no longer represent 1000000.1

So it is a good idea to always use double rather than float, unless you are using some graphics library that requires them, say.

double is usually good enough for most people, but there are extended precision types in libraries such as boost and others. For example Astrophysicists use light years instead of metres for units.

long double only provides an extra 2 significant figures, but has larger exponents.

I read somewhere that exact decimal are coming / proposed in C++17.


Hi guys,

Thanks for the responses. I have to admit I was so impatient to know the answer that I also posted on StackOverflow here: http://stackoverflow.com/questions/26690196/strange-behaviour-when-comparing-cast-float-to-zero

I think between your answers and theirs, I now have a complete picture. Thanks in particular to MiiNiPaa for the link to the interesting articles.

MiiNiPaa, you are right. Assembler code reveals that on my Windows machine, the compiler apparently uses 387 instructions which store intermediary values at a higher precision, which explains why it didn't evaluate to zero. I tried it on a different machine running Ubuntu, which gave a different answer as it was using SSE instructions and thus not using excess precision at intermediate steps.

So thanks again everyone!
Registered users can post here. Sign in or register to post.