slides/03-numbers.html

<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>CS 2150: 03-numbers slide set</title>
    <meta name="description" content="A set of slides for a course on Program and Data Representation">
    
    <meta name="apple-mobile-web-app-capable" content="yes" />
    <meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
    <link rel="stylesheet" href="../slides/reveal.js/dist/reset.css">
    <link rel="stylesheet" href="../slides/reveal.js/dist/reveal.css">
    <link rel="stylesheet" href="../slides/reveal.js/dist/theme/black.css" id="theme">
    <link rel="stylesheet" href="../slides/css/pdr.css">
    <!-- Code syntax highlighting -->
    <link rel="stylesheet" href="../slides/reveal.js/plugin/highlight/zenburn.css">
    <!-- Printing and PDF exports -->
    <script>
      var link = document.createElement( 'link' );
      link.rel = 'stylesheet';
      link.type = 'text/css';
      link.href = window.location.search.match( /print-pdf/gi ) ? '../slides/reveal.js/css/print/pdf.scss' : '../slides/reveal.js/css/print/paper.scss';
      document.getElementsByTagName( 'head' )[0].appendChild( link );
    </script>
    <!--[if lt IE 9]>
	<script src="../slides/reveal.js/lib/js/html5shiv.js"></script>
	<![endif]-->
    <style>.reveal li { font-size:93%; line-height:120%; }</style>
  </head>

  <body>

    <div class="reveal">

      <!-- Any section element inside of this container is displayed as a slide -->
      <div class="slides">

	<section data-markdown id="cover"><script type="text/template">
# CS 2150
&nbsp;
### Program and Data Representation

<p class='titlep'>&nbsp;</p>
<div class="titlesmall"><p>
<a href="http://www.cs.virginia.edu/~mrf8t">Mark Floryan</a> (mrf8t@virginia.edu)<br>
<a href="http://www.cs.virginia.edu/~asb">Aaron Bloomfield</a> (aaron@virginia.edu)<br>
<a href="http://github.com/uva-cs/pdr">@github</a> | <a href="index.html">&uarr;</a> | <a href="./03-numbers.html?print-pdf"><img class="print" width="20" src="../slides/images/print-icon.png" style="top:0px;vertical-align:middle"></a>
</p></div>
<p class='titlep'>&nbsp;</p>

## Number Representation
	</script></section>

    <section>
<h2>CS 2150 Roadmap</h2>
<table class="wide">
  <tr><td colspan="3"><p class="center">Data Representation</p></td><td></td><td colspan="3"><p class="center">Program Representation</p></td></tr>
  <tr>
    <td class="top"><small>&nbsp;<br>&nbsp;<br>string<br>&nbsp;<br>&nbsp;<br>&nbsp;<br>int x[3]<br>&nbsp;<br>&nbsp;<br>&nbsp;<br>char x<br>&nbsp;<br>&nbsp;<br>&nbsp;<br>0x9cd0f0ad<br>&nbsp;<br>&nbsp;<br>&nbsp;<br>01101011</small></td>
    <!-- image adapted from http://openclipart.org/detail/3677/arrow-left-right-by-torfnase -->
    <td><img class="noborder" src="images/red-double-arrow.png" height="500" alt="vertical red double arrow"></td>
    <td class="top">&nbsp;<br>Objects<br>&nbsp;<br>Arrays<br>&nbsp;<br>Primitive types<br>&nbsp;<br>Addresses<br>&nbsp;<br>bits</td>
    <td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td>
    <td class="top"><small>&nbsp;<br>&nbsp;<br>Java code<br>&nbsp;<br>&nbsp;<br>C++ code<br>&nbsp;<br>&nbsp;<br>C code<br>&nbsp;<br>&nbsp;<br>x86 code<br>&nbsp;<br>&nbsp;<br>IBCM<br>&nbsp;<br>&nbsp;<br>hexadecimal</small></td>
    <!-- image adapted from http://openclipart.org/detail/3677/arrow-left-right-by-torfnase -->
    <td><img class="noborder" src="images/green-double-arrow.png" height="500" alt="vertical green double arrow"></td>
    <td class="top">&nbsp;<br>High-level language<br>&nbsp;<br>Low-level language<br>&nbsp;<br>Assembly language<br>&nbsp;<br>Machine code</td>
  </tr>
</table>
    </section>

	<section data-markdown><script type="text/template">
# Contents
&nbsp;  
[Introduction](#/introduction)  
[Radix Conversion](#/radix)  
[Machine Representation](#/machinerep)  
[Endian-ness](#/endian)  
[Integer Representation](#/integers)  
[Real Representation](#/reals)  
	</script></section>


	<section>

	  <section id="introduction" data-markdown class="center"><script type="text/template">
# Introduction
	  </script></section>

	  <section data-markdown><script type="text/template">
## Numbers vs. Numerals
- Which is bigger?
  - 5 or 8 or 12
  - What if they are sorted alphabetically?
- Which is "five"?
  - five or V or cinq or 101
- We use numerals represent numbers
	  </script></section>

	  <section>
<h2>Positional Number Systems</h2>
<ul>
  <li>Integers
    <ul>
      <li>346 = 3*10<sup>2</sup> + 4*10<sup>1</sup> + 6*10<sup>0</sup></li>
      <li>346 = 2<sup>8</sup> + 2<sup>6</sup> + 2<sup>4</sup> + 2<sup>3</sup> + 2<sup>1</sup>
	<ul><li>=1*2<sup>8</sup>+0*2<sup>7</sup>+1*2<sup>6</sup>+0*2<sup>5</sup>+1*2<sup>4</sup>+1*2<sup>3</sup>+0*2<sup>2</sup>+1*2<sup>1</sup>+0*2<sup>0</sup></li></ul></li>
      <li>\( d_{n} d_{n-1} \ldots d_{0} = \sum_{i=0}^{n} d_{i} \cdot R^{i} \)</li>
    </ul>
  </li>
  <li>Reals
    <ul>
      <li>\( d_{n} d_{n-1} \ldots d_{0} . d_{-1} d_{-2} \ldots d_{-m} = \sum_{i=-m}^{n} d_{i} \cdot R^{i} \)</li>
    </ul>
  </li>
</ul>
	  </section>

	  <section data-markdown><script type="text/template">
## Examples
- Binary (base 2): 1111<sub>2</sub>
- Ternary (base 3): 120<sub>3</sub>
- Octal (base 8): 17<sub>8</sub>
- Hexadecimal (base 16): F
	  </script></section>

	</section>


	<section>

	  <section id="radix" data-markdown class="center"><script type="text/template">
# Radix Conversion
	  </script></section>

	    <section>
<h2>Conversion between bases</h2>
<p>Radix <i>R</i> to decimal:<p>
<p>\( n = d_n R^n + \ldots + d_0 R^0 \)</p>
<p>&nbsp;</p>
<p>Decimal to radix <i>R</i>:</p>
<p>\( \frac{n}{R} = d_n R^{n-1} + \ldots + d_1 R^0 \), remainder \( d_0 \)</p>
	    </section>

	    <section>
<h2>Radix to Decimal</h2>
<p>\( 42_5 \)</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>\( 121_3 \)</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<script type="text/javascript">insertCanvas();</script>
	    </section>


	    <section>
<h2>Decimal to Radix</h2>
<p>\( 42_{10} \) to radix 5</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>\( 121_{10} \) to radix 11</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<script type="text/javascript">insertCanvas();</script>
	    </section>

	    <section data-markdown><script type="text/template">
## What is the value of "101"?
- In binary: 5
- In octal (base 8): 65
- In decimal: 101
- In hexadecimal: 257
	    </script></section>

	    <section data-markdown><script type="text/template">
## Specifying numbers in other bases
- In C/C++, any integer that begins with '0' is interpreted as octal
  - 073 = 73<sub>8</sub> = 59<sub>10</sub> 
- And any number that begins with '0x' is interpreted as hexadecimal
  - 0x73 = 73<sub>16</sub> = 115<sub>10</sub>
  - 0x3f = 3f<sub>16</sub> = 63<sub>10</sub>
  - Case does not matter (0xFF == 0xff)
- There is no (convenient) way to specify a number in binary (they are almost always done in hex)
	    </script></section>

	    <section data-markdown><script type="text/template">
## Converting between binary and hexadecmial
- Split the binary number into 4 bit chunks ('nibbles')
- Convert each one to a ***single*** hexadecimal digit
  - Converting through decimal, if necessary
  - Example: 0100 1010 1000 1101<sub>b</sub> => 0x4a8d
- Likewise, hexadecimal to binary is just converting each hex digit to 4 bits
  - Example: 0x13ac => 0001 0011 1010 1100<sub>b</sub>
- Keep in mind that it takes ***two*** hex digits to make a byte
	    </script></section>

	</section>


	<section>

	  <section id="machinerep" data-markdown class="center"><script type="text/template">
# Machine<br>Representation
	  </script></section>

	    <section data-markdown><script type="text/template">
## ENIAC
- Started 1943 - early electronic programmable computer
- Operational in 1946, and computed ballistics tables
- 17,468 vacuum tubes and used 150 kW of power 
- Earlier Computers: Z3 (Conrad Zuse) in 1941; Colossus in 1943

![ENIAC](images/03-numbers/eniac.png)
	    </script></section>

	    <section>
<h2>Directions for getting 6<h2>
<ol style="font-size:55%;line-height:120%;font-variant:normal">
<li>Choose any regular accumulator (ie. Accumulator #9).</li>
<li>Direct the Initiating Pulse to terminal 5i.</li>
<li>The initiating pulse is produced by the initiating unit's Io terminal (usually plugged into Program Line 1-1) each time the Eniac is started. Simply connect a program cable from Program Line 1-1 to terminal 5i on this Accumulator.</li>
<li>Set the Repeat Switch for Program Control 5 to 6.</li>
<li>Set the Operation Switch for Program Control 5 to ADD.</li>
<li>Set the Clear-Correct switch to C.</li> 
<li>Turn on and clear the Eniac. </li>
<li>If there are random neons illuminated in the accumulators, press the "Initial Clear" button of the Initiating device</li>
<li>Press the "Initiating Pulse Switch" that is located on the Initiating device.</li>
<li><b>Stand back.</b></li>
</ol>
	    </section>

	    <section data-markdown><script type="text/template">
## ENIAC Number Representation
- Decimal system
  - Ring of 36 vacuum tubes to store one digits (10 flip-flops to store 0-9)
  - Designed to emulate mechanical adding machine electronically
  - 20 accumulators (~registers), each stores 10-digits
- 5,000 cycles per second
  - Perform addition/subtraction between 2 accumulators each cycle
	    </script></section>

	    <section data-markdown><script type="text/template">
## Binary Number Representations
- First presented by Gottfried Leibniz, 1705 ("Explication de l'Arithmetique Binaire")
- George Boole ("Boolean" logic), 1854
- Claude Shannon's 1937 Master's thesis: implemented Boolean algebra with switches and relays
- Used by Atanasoff-Berry Computer, Colossus and Z3
	    </script></section>

	    <section>
<h2>Binary Representation</h2>
<p><i>n</i>-bit binary number: \( b_{n-1} b_{n-2} b_{n-3} \ldots b_2 b_1 b_0 \)</p>
<p>Value = \( \sum_{i=0}^{n-1} b_i \cdot 2^i \)</p>
<p>&nbsp;</p>
<p>Boolean arithmetic:</p>
<ul>
<li>0 + 0 = 0</li>
<li>0 + 1 = 1</li>
<li>1 + 0 = 1</li>
<li>1 + 1 = 0 carry 1</li>
</ul>
<p>Maximum value is \( 2^n-1 \) -- but what should \( n \) be?</p>
	    </section>

	    <section data-markdown><script type="text/template">
## What is *n*?
- Java: 
  - byte, char = 8 bits
  - short = 16 bits
  - int = 32 bits
  - long = 64 bits
- C: implementation-defined
  - `unsigned int`: can hold between 0 and `UINT_MAX`
    - `UINT_MAX` must be at least 65535
      - `UINT_MAX` is in `<climits>`
      - n >= 16, typical current machines *n* = 32 or 64
    - `sizeof(int)` will evaluate to the byte size of an int in C/C++
	    </script></section>

	</section>


	<section>

	  <section id="endian" data-markdown class="center"><script type="text/template">
# Endian-ness
	  </script></section>

	  <section data-markdown><script type="text/template">
## The Great Debate
- "Big-endian": most significant ***first*** (lowest address)
  - 1000 0000 0000 0000 = 2<sup>15</sup> = 32768
- "Little-endian": most significant ***last*** (highest address)
  - 1000 0000 0000 0000 = 2<sup>0</sup> = 1
- Which is better?
- Note that although all the *bits* are reversed, usually it is displayed with just the *bytes* reversed
	  </script></section>

	  <section data-markdown><script type="text/template">
## More on Endian-ness
- Often refers to *byte* ordering, rather than *bit* ordering
- Consider 0xdeadbeef
  - On a big-endian machine, that's 0xdeadbeef
  - On a little-endian machine, that's 0xefbeadde
- 0xdeadbeef is used as a memory allocation pattern by some OSes
	  </script></section>

	  <section data-markdown><script type="text/template">
## Endianness
- It's a "religious" argument: names taken from Big-Endians and Little-Endians in *Gulliver's Travels* who argued over which end of an egg to crack
- Different orderings problematic
  - Consider what << means in C
    - big-endian ~ multiply by 2
    - little-endian ~ divide by 2
- Some architectures support both ("bi-Endian"): PowerPC, DEC Alpha, IA/64
- There were even some middle-endian machines once upon a time
- Most Internet standards: big-endian
	  </script></section>

<!-- 
	  <section data-markdown><script type="text/template">
## Endian checking, 1 of 2
(no external source code)
```
void CheckEndian () {

  static int firsttime = 1;
  if (firsttime) {
    union {
      char charword[4];
      unsigned int intword;
    } check;

    check.charword[0] = 1;    check.charword[1] = 2;
    check.charword[2] = 3;    check.charword[3] = 4;

    // continued on next slide...
```
	  </script></section>

	  <section data-markdown><script type="text/template">
## Endian checking, 2 of 2
(no external source code)
```
#ifdef IS_BIG_ENDIAN
    if (check.intword != 0x01020304) {  /* big */
      cerr << "ERROR: Host machine is not Big-endian.\n"
           << "Exiting." << endl;
      Exit (205);    }
#else
#ifdef IS_LITTLE_ENDIAN
    if (check.intword != 0x04030201) {  /* little */
      cerr << "ERROR: Host machine is not Little-endian.\n"
           << "Exiting." << endl;  
      Exit (206);    }
#else
    cerr << "ERROR: Host machine not defined as Big or "
         << Little-endian.\nExiting." << endl;
    Exit (207);
#endif // IS_LITTLE_ENDIAN
#endif // IS_BIG_ENDIAN
    firsttime = 0;
  }
}
```
	  </script></section>

	  <section data-markdown><script type="text/template">
## Always writing a little-endian file
```
void Image::WriteInt (int value, FILE * fp) {
  union {
    int intvalue;
    struct {
#ifdef IS_LITTLE_ENDIAN
      char a, b, c, d;
#else
#ifdef IS_BIG_ENDIAN
      char d, c, b, a;
#else
#error Must define IS_BIG_ENDIAN or IS_LITTLE_ENDIAN
#endif                          // IS_BIG_ENDIAN
#endif                          // IS_LITTLE_ENDIAN
    } endian;
  } e;
  e.intvalue = value;
  fputc (e.endian.a, fp);
  fputc (e.endian.b, fp);
  fputc (e.endian.c, fp);
  fputc (e.endian.d, fp);
}
```
	  </script></section>

-->

	  <section data-markdown><script type="text/template">
## More on Endianness
- Little vs. big-endian deals with the *byte* order, not the *bit* order

![Endian-ness](images/03-numbers/endian-ness.png)
	  </script></section>

	  <section data-markdown><script type="text/template">
## Another way to think of Endianness
- Big-endian:
  - the quick brown fox jumped over the lazy dog
- Little-endian
  - dog lazy the over jumped fox brown quick the
	  </script></section>

	</section>


	<section>

	  <section id="integers" data-markdown class="center"><script type="text/template">
# Integer<br>Representation
	  </script></section>

	  <section data-markdown><script type="text/template">
## Sign-and-magnitude
- Sign Bit, sign-and-magnitude
![sign and magnitude](images/03-numbers/sign-and-magnitude.png)
- Algorithm to encode:
  - Encode absolute value of the number using *n*-1 bits
  - First bit is 1 if the number is < 0
- Problem!
  - Two representations for 0
	  </script></section>

	  <section data-markdown><script type="text/template">
## One's complement
- Sign Bit, One's Complement
![one's complement](images/03-numbers/ones-complement.png)
- Algorithm to encode:
  - Encode absolute value of the number using *n*-1 bits
  - If negative, flip all the bits
- Problem!
  - Still two representations for 0
	  </script></section>

	  <section data-markdown><script type="text/template">
## Two's complement
- Avoids two representations for zero
![two's complement](images/03-numbers/twos-complement.png)
- A negative number has it's bits fipped, and then you add 1
  - This shifts all the numbers by one, avoiding two representations for zero
- Most common means of representing integers in computers
	  </script></section>

	  <section data-markdown><script type="text/template">
## Two's complement
- Algorithm for an *n*-bit memory space:
  - Zero is *n* 0's
  - For positive numbers, encode normally in *n*-1 bits
    - Maximum value is 2<sup>*n*-1</sup>-1
    - Sign bit is zero
    - This, zero is a "positive" number! (and is all zeros)
  - For negative numbers, take the absolute value
    - Then subtract that from 2<sup>*n*</sup>, and encode that value
    - Maximum value is -2<sup>*n*-1</sup>
    - Alternatively, encode the absolute value, flip the bits, and add 1
	  </script></section>

	  <section>
<h2>Two's complement (n=8)</h2>
<table class="transparent"><tr><td class="top">
<ul>
<li>0<ul class="fragment"><li>0<sub>d</sub> = 00000000<sub>b</sub></li></ul></li>
<li>1<ul class="fragment"><li>1<sub>d</sub> = 00000001<sub>b</sub></li></ul></li>
<li>10 = 8+2<ul class="fragment"><li>10<sub>d</sub> = 00001010<sub>b</sub></li></ul></li>
<li>100 = 64+32+4<ul class="fragment"><li>100<sub>d</sub> = 01100100<sub>b</sub></li></ul></li>
<li>127 = 64+32+16+8+4+2+1<ul class="fragment"><li>127<sub>d</sub> = 01111111<sub>b</sub></li></ul></li>
</ul>
</td><td>&nbsp;</td><td class="top">
<ul>
<li>-1<ul class="fragment"><li>+1<sub>d</sub> = 00000001<sub>b</sub></li><li>-1<sub>d</sub> = 11111111<sub>b</sub></li></ul></li>
<li>-10<ul class="fragment"><li>+10<sub>d</sub> = 00001010<sub>b</sub></li><li>-10<sub>d</sub> = 11110110<sub>b</sub></li></ul></li>
<li>-100<ul class="fragment"><li>+100<sub>d</sub> = 01100100<sub>b</sub></li><li>-100<sub>d</sub> = 10011100<sub>b</sub></li></ul></li>
<li>-128<ul class="fragment"><li>+128<sub>d</sub> = 10000000<sub>b</sub></li><li>-128<sub>d</sub> = 10000000<sub>b</sub></li></ul></li>
</ul>
</td></tr></table>
	  </section>

	  <section data-markdown><script type="text/template">
## Two's complement (*n*=8)
| sign | msb | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | lsb | | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | = | 127 |
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | = | 2 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | = | 1 |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | = | 0 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | = | -1 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | = | -2 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | = | -127 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | = | -128 |
	  </script></section>

	  <section data-markdown><script type="text/template">
## Integer overflow
- Given a signed 8-bit integer: 127 = 0111 1111
  - The maximum value for that data type
- If you add 1 to that, you get 1000 0000 (= -128<sub>d</sub>)
- This is called *overflow* (or *integer overflow*)
- Other common overflows:
  - 16-bits: 32,767 + 1 = -32,768
  - 32-bits: 2,147,483,647 + 1 = -2,147,483,648
  - 64-bits: 9,223,372,036,854,775,807 + 1 = -9,223,372,036,854,775,808
- A real-word application: [*Gangnam Style* overflows INT_MAX, forces YouTube to go 64-bit](http://arstechnica.com/business/2014/12/gangnam-style-overflows-int_max-forces-youtube-to-go-64-bit/) (Dec 3, 2014)
  - If you care, it's now at 3.2 billion views (as of Sep 2018)
	  </script></section>

	  <section data-markdown><script type="text/template">
## unsigned types
- C/C++ has *unsigned* types
  - A 32-bit `int` has a range from $-2^{31} \rightarrow 2^{31}-1$
    - That's -2.15 billion to +2.15 billion
    - or maybe it's 64 bits...
  - An `unsigned int` has a range from $0 \rightarrow 2^{32}-1$
	- That's zero to +4.3 billion
    - Still $2^{32}$ values, but all positive (and zero)
- Tyipcally used for returning sizes of strings, vectors, etc.
	  </script></section>

	  <section>
<h2>A bit of an aside...</h2>
<p>This is an <i>actual</i> comment left by a graduating SEAS computer science major:</p>
<p>&nbsp;</p>
<blockquote class="fragment">
  <p>I can't count the number of times I was taught how to count in binary</p>
</blockquote>
<p>&nbsp;</p>
<p class="fragment">So this material may appear again in the CS curriculum.  Hopefully everybody will (eventually) learn it...</p>
	  </section>

	</section>


	<section>

	  <section id="reals" data-markdown class="center"><script type="text/template">
# Real<br>Representation
	  </script></section>

	  <section data-markdown><script type="text/template">
## Real Numbers
- 1/3
- &pi;
- 0.1
- 3.333333333333 * 10<sup>-1</sup>
- &radic;<span style="text-decoration:overline;">&nbsp;2&nbsp;</span>
	  </script></section>

	  <section data-markdown><script type="text/template">
## Fixed Point
- Radix point is fixed at some position
![fixed point](images/03-numbers/fixed-point.png)
- Pros:
  - Less computationally demanding
  - Good for CPUs that don't have an FPU
  - Number representation space is uniform
- Cons:
  - Less flexible
  - Nobody uses it anymore
  - Small range of values
	  </script></section>

	  <section data-markdown><script type="text/template">
## Floating Point, part 1
- Each number in scientific notation has four parts: -3.24 \* 10<sup>6</sup>
  - Sign bit (1 is negative)
  - Mantissa (the "value" of the number): 3.24 in the example above
    - Always in the range 1.0 &le; *m* < 10.0 for decimal numbers in scientific notation
    - Always in the range 1.0 &le; *m* < 2.0 in the binary representation
	  </script></section>

	  <section data-markdown><script type="text/template">
## Floating Point, part 2
- Each number in scientific notation has four parts: -3.24 \* 10<sup>6</sup>
  - Base
    - This is 10 for typically scientific notation, but 2 for computers
    - As this is always known for floats and doubles, it is omitted
  - Exponent (power of the base to multiply mantissa by): 6 in the example above
	  </script></section>

	  <section data-markdown><script type="text/template">
## Conversion overview
- We are going to convert a number such as -3.24 \* 10<sup>6</sup> to a format where:
  - The mantissa is between 1 and 2 (not 1 and 10)
  - The base is 2 (not 10)
- Example: 106 = 1.06 \* 10<sup>2</sup> = 1.65625 * 2<sup>6</sup>
- Once in that format, it is viable to encode in binary
	  </script></section>

	  <section>
<h2>IEEE 754 Floating Point<br>Single Precision (32 bits)</h2>
<ul>
<li>32 bits are split as follows:
<ul>
  <li>bit 1: sign bit, 1 means negative (1 bit)</li>
  <li>bits 2-9: exponent (8 bits)</li>
  <li>bits 10-32: mantissa (23 bits)</li>
</ul>
</li>
<li>Exponent values:
<ul>
  <li>0: zeros</li>
  <li>1-254: exponent+127<ul>
    <li>The value of 127 is called the <i>exponent offset</i> or the <i>bias</i></li></ul></li>
  <li>255: infinities, overflow, underflow, NaN</li>
</ul>
</li>
</ul>
<p>&nbsp;</p>
<p>\( \text{value} = (1-2*\text{sign}) * (1 + \text{mantissa}) * 2^{\text{exponent}-127} \)</p>
	  </section>

	  <section>
<h2>Mantissa</h2>
<p class="center" style="font-size:90%">\( b_{1}b_{2}b_{3}b_{4}b_{5}b_{6}b_{7}b_{8}b_{9}b_{10}b_{11}b_{12}b_{13}b_{14}b_{15}b_{16}b_{17}b_{18}b_{19}b_{20}b_{21}b_{22}b_{23} \)</p>
<p>&nbsp;</p>
<p class="center">\( \text{mantissa} = 1.0 + \sum^{23}_{i=1} \frac{b_i}{2^i} \)</p>
<p>&nbsp;</p>
<ul>
<li>If the bits are all 1, what is the mantissa value?
<ul><li>1 + (almost) 1 = (almost) 2: the maximum value for the mantissa</li>
<li>Actually about 1.9999999999999</li></ul></li>
<li>Minimum value is all 0's
<ul><li>That's a mantissa of 1.0</li></ul></li>
</ul>
	  </section>

	  <section data-markdown><script type="text/template">
## Converting a float from binary to decimal
- Example taken from [Wikipedia](http://en.wikipedia.org/wiki/Single_precision)
- 0x41c80000 (big-endian) = 0100 0001 1100 1000 0000 0000 0000 0000
  - Sign bit: 0 (means it's positive)
  - Exponent: 1000 0011<sub>b</sub> = 0x83 = 131<sub>d</sub>
    - Exponent offset (aka bias): 2<sup>n-1</sup>-1 = 2<sup>(8-1)</sup>-1 = 127
    - Subtract 127 from exponent value yields 4
    - Which means multiply the mantissa by 2<sup>4</sup>=16
  - Mantissa: 100 1000 0000 0000 0000 0000
	  </script></section>

	  <section data-markdown><script type="text/template">
## Converting a float from binary to decimal
- Mantissa: 100 1000 0000 0000 0000 0000
  - Each bit represents (1/2)<sup>*n*</sup> for *n* from 1 on up
    - Not 0 on up!
  - This mantissa has the first and fourth bits set
  - The '1' bits in positions 1 and 4 represent (1/2)<sup>1</sup> and (1/2)<sup>4</sup>
    - That's 0.5 + 0.0625 = 0.5625
  - Then add 1 to that to yield 1.5625
- Now multiply by 2<sup>*exponent*</sup>
  - 1.5625 \* 2<sup>4</sup> = 1.5625 \* 16 = 25
- Thus, 0x41c80000 = 25.0<sub>d</sub>
	  </script></section>

	  <section data-markdown id="maxfloatvalue"><script type="text/template">
## IEEE floating point maximum<br>(finite) positive value
- The largest float has:
  - 0 as the sign bit (it's positive)
  - 254 as the exponent (1111 1110)
    - 255 is reserved for infinities and overflows
    - That exponent is 254-127 = 127
  - All 1's for the mantissa
    - Which yields *almost* 2
  - 2 \* 2<sup>127</sup> = 2<sup>128</sup> = 3.402823 \* 10<sup>38</sup>
	  </script></section>

	  <section data-markdown><script type="text/template">
## IEEE floating point minimum<br>(finite) positive value
- The smallest float has:
  - 0 as the sign bit (it's positive)
  - Binary 1 as the exponent (0000 0001)
    - 0 is reserved for zeros
    - That exponent is 1-127 = -126
  - All 0's for the mantissa
    - Which yields 1.0 as the mantissa
  - 1 \* 2<sup>-126</sup> = 2<sup>-126</sup> = 1.175494 x 10<sup>-38</sup>
	  </script></section>

	  <section data-markdown><script type="text/template">
## Not spatially uniform!
- Consider a numerical type that holds only 3 decimal digits
  - Examples: 1.23, 12.3, 123, 1,230
    - In scientific notation, respectively: 1.23\*10<sup>0</sup>, 1.23\*10<sup>1</sup>, 1.23\*10<sup>2</sup>, 1.23\*10<sup>3</sup>
- Consider the next highest number for each of them:
  - Next highest in this numerical type, not of all real numbers
  - 1.23 => 1.24; difference is 0.01
  - 12.3 => 12.4; difference is 0.1
  - 123 => 124; difference is 1
  - 1,230 => 1,240; difference is 10
- Depending on the exponent, the difference between two successive numbers is not the same!
	  </script></section>

	  <section data-markdown><script type="text/template">
## Floating point numbers are<br>not spatially uniform
- Consider two positive floats
- Both have a mantissa with just the last bit set
- One has an exponent of -126 (i.e. 1), the other 127 (i.e. 254)
- Now flip the second to last mantissa bit in both numbers
  - The number with the higher exponent will "gain" more than the number with a lower exponent
	  </script></section>

	  <section data-markdown><script type="text/template">
## Converting a float<br>from decimal to binary
- Sign: 1 if negative (signed), 0 if non-negative (unsigned)
- Exponent (*e*): find what power of 2 is required to bring the number to 1.0 <= *x* < 2.0
  - If you have to multiply, then it's negative
  - If you have to divide, then it's positive
  - Add 127 to that value
- The mantissa is *f/2<sup>e</sup>*
  - Note that *e* is the exponent before adding 127 to it
  - Subtract 1 from the mantissa (so that 0.0 &le; *m* < 1.0)
  - Convert to closest representation using powers of &frac12;
  - Similar to converting a base *n* number to base 10
	  </script></section>

	  <section data-markdown><script type="text/template">
## Converting a float
Source code: [float_to_hex.cpp](code/03-numbers/float_to_hex.cpp.html) ([src](code/03-numbers/float_to_hex.cpp)) (we want 32-bit pointers, so we compile with `-m32`)

```
#include <iostream>
using namespace std;

union foo {
    float f;
    int *x;
} bar;

int main() {
    bar.f = 42.125;
    cout << bar.x << endl; // prints in big-endian
    return 0;
}
```
Output: 0x42288000
	  </script></section>

	  <section data-markdown><script type="text/template">
## Converting a float<br>from decimal to binary
- Take 42.125 (see [float_to_hex.cpp](code/03-numbers/float_to_hex.cpp.html) ([src](code/03-numbers/float_to_hex.cpp)))
  - Sign is 0 (it's positive)
  - Convert the float to a sum of powers of 2
    - 42.125 = 32 + 8 + 2 + 1/8
    -  = 2<sup>5</sup> + 2<sup>3</sup> + 2<sup>1</sup> + 2<sup>-3</sup>
    - (details on the next slides)
  - Convert to binary (base 2)
    - 2<sup>5</sup> + 2<sup>3</sup> + 2<sup>1</sup> + 2<sup>-3</sup> = 101010.001<sub>b</sub>
  - Move the decimal to put in scientific notation
    - 101010.001<sub>b</sub> = 1.01010001<sub>b</sub> * 2<sup>5</sup>
    - It slides over 5 spots, so the exponent is 5
</script></section>

	  <section data-markdown><script type="text/template">
## Converting a number to powers of 2
- Consider 42.125
- For the integer portion (42):
  - Repeatedly subtract the highest power of 2 that is less than or equal to the number
  - 42 - 32 => 10 - 8 => 2 - 2 => 0
  - Thus, 42 = 32 + 8 + 2 = 2<sup>5</sup> + 2<sup>3</sup> + 2<sup>1</sup>
</script></section>

	  <section data-markdown><script type="text/template">
## Converting a number to powers of 2
- Consider 42.65625 (NOT the number on the previous slide)
- For the decimal portion (0.65625):
  - Repeatedly subtract the *lowest* power of 1/2 that is less than or equal to the number
  - Often easist done in rational form: 0.65625 = 21/32
  - 21/32 - 16/32 (aka 1/2) => 5/32 - 4/32 (aka 1/8) => 1/32 - 1/32 => 0
  - Thus, 0.65625 = 21/32 = 1/2 + 1/8 + 1/32 = 2<sup>-1</sup> + 2<sup>-3</sup> + 2<sup>-5</sup>
- If we consider 42.125 (the number on the previous slide), the decimal portion is just 0.125 = 1/8 = 2<sup>-3</sup>
</script></section>

	  <section data-markdown><script type="text/template">
## Converting a float<br>from decimal to binary (again)
- Take 42.125 (see [float_to_hex.cpp](code/03-numbers/float_to_hex.cpp.html) ([src](code/03-numbers/float_to_hex.cpp)))
  - Sign is 0 (it's positive)
  - Convert the float to a sum of powers of 2
    - 42.125 = 32 + 8 + 2 + 1/8
    -  = 2<sup>5</sup> + 2<sup>3</sup> + 2<sup>1</sup> + 2<sup>-3</sup>
    - (details on the next slide)
  - Convert to binary (base 2)
    - 2<sup>5</sup> + 2<sup>3</sup> + 2<sup>1</sup> + 2<sup>-3</sup> = 101010.001<sub>b</sub>
  - Move the decimal to put in scientific notation
    - 101010.001<sub>b</sub> = 1.01010001<sub>b</sub> * 2<sup>5</sup>
    - It slides over 5 spots, so the exponent is 5
</script></section>

	  <section data-markdown><script type="text/template">
## Converting a float<br>from decimal to binary
- Take 1.<span class='skyblue'>01010001</span><sub>b</sub> * 2<sup><span class='red'>5</span></sup>
  - Mantissa is the bits after the decimal
    - Ignore the leading 1. Why?
    - Append as many trailing zeros as necessary
  - Exponent is <span class='red'>5</span>+bias = <span class='red'>5</span>+127 = 132 (<span class='green'>1000 0100</span><sub>b</sub>)
  - In binary, then, it's:
    - <span class='pink'>0</span><span class='green'>100 0010 0</span><span class="skyblue">010 1000 1000 0000 0000 0000</span>
      - The <span class='pink'>sign</span>, <span class='green'>exponent</span> and <span class='skyblue'>mantissa</span> are colored differently for clarity
  - In hex: 0x42288000
	  </script></section>

	  <section data-markdown><script type="text/template">
## Example
- 1/10 = 0.1 (Decimal)
- What is this in binary?
  - 1/10 &asymp; 1/16 + 1/32
    - 1/16 + 1/32 = 3/32
  - 3/32 is off from 1/10 by 0.2/32
  - 0.2/32 = 2/320 &asymp; 1/256 + 1/512
  - But this does not exactly equal 0.1 yet...
	  </script></section>

	  <section data-markdown><script type="text/template">
## Dividing one by ten in binary
![long division](images/03-numbers/long-division.png)
Even common decimals like 0.1 cannot be represented exactly!
	  </script></section>

	  <section data-markdown><script type="text/template">
## Floating point precision in Java
What gets printed? (source: [FloatTest.java](code/03-numbers/FloatTest.java.html) ([src](code/03-numbers/FloatTest.java)))
```
class FloatTest {
    public static void main (String args[]) {
        // There are 10 0.1's in the next statement
        double y = 0.1 + 0.1 + 0.1 + 0.1 + 0.1 +
                   0.1 + 0.1 + 0.1 + 0.1 + 0.1;
        System.out.println (y);
    }
}

```
Hint: it's not 1.0!
	  </script></section>

	  <section data-markdown><script type="text/template">
## Comparing floats & doubles
- Consider, in Java:
```
double a = 1;
double b = 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1
               + 0.1 + 0.1 + 0.1 + 0.1;
double c = .9999999999999999;
```
- Two true expressions! 
  - `c == b`
  - `b != a`
- Two false expressions!
  - `a == b`
  - `b != c`
- Problem is the finite precision of floating-point types
  - Instead with the ordering operators for closeness
	  </script></section>

	  <section data-markdown><script type="text/template">
## How to solve this
- Don't compare floating-point values if you can help it!
  - Both doubles and floats
- Need to test if the two doubles are "close" in value
```
// Java code
final double EPSILON = 0.000001;
boolean foo = Math.abs (a-b) < EPSILON;
```
```
// C++: #include <math.h> & compile with -lm
#define EPSILON 0.000001
bool foo = fabs (a-b) < EPSILON;
```
- A float has 7 decimal places of (printed) accuracy
  - And two floats may differ by a value that is at 8 decimal places
  - It prints as 1.0, but is really 0.9999999999
	  </script></section>

	  <section data-markdown><script type="text/template">
## Floating point rounding errors
- Add 0.33333333333 three times, and you don't get 1.0
  - 1/3 cannot be represented via a (finite) decimal num
    - Any denominator that has factors that are not factors of the radix
    - Decimal radix factors: 2 an 5
    - Good: 25 (factors: 5, 5), 10 (factors: 2, 2, 5, 5), etc.
    - Bad: 1/3, 1/7, 1/11, 1/12
  - Same thing with 0.1 (and many others!) in binary
    - Any denominator that has factors that are not factors of the radix (which is just 2!)
    - I.e. 1/3, 1/5, 1/7, etc. - anything whose denominator is not a power of 2
	  </script></section>

	  <section>
<h2>Floating point rounding errors</h2>
<ul>
<li>0.1 is stored (as a 32 bit float) as
<ul><li>mantissa = 100 1100 1100 1100 1100 1101</li></ul></li>
<li>Or: \( \frac{1+\frac{1}{2^1}+\frac{1}{2^4}+\frac{1}{2^5}+\frac{1}{2^8}+\frac{1}{2^9}+\frac{1}{2^{12}}+\frac{1}{2^{13}}+\frac{1}{2^{16}}+\frac{1}{2^{17}}+\frac{1}{2^{20}}+\frac{1}{2^{21}}+\frac{1}{2^{23}}}{2^4} \)<br>&nbsp;</li>
<li>Which is exactly equal to:
<ul><li> 0.100000001490116119384765625</li>
<li>But only the first 7 digits, .1000000, are printed</li></ul></li>
<li>0.1 (exactly) is a finite number in decimal, but repeating in binary</li>
</ul>
	  </section>

	  <section data-markdown><script type="text/template">
## How to solve this
- Store number as a rational number, if possible
  - Works for 1/3, 1/7, etc.
- Use more digits
  - A "bigger" floating point number
  - BigFloat (analgous to BigInteger)
  - But these still have a finite number of digits!
	  </script></section>

	  <section data-markdown><script type="text/template">
## Real example: Patriot Missile
- Gulf War I (1990-1991)
- Failed to intercept incoming Iraqi scud missile (Feb 25, 1991); both travel at about Mach 5
- 28 American soldiers killed
![patriot missile launch](images/03-numbers/patriot-missile.jpg)
- GAO report [here](http://www.fas.org/spp/starwars/gao/im92026.htm); image from [Wikipedia](http://en.wikipedia.org/wiki/File:Patriot_missile_launch_b.jpg)
	  </script></section>

	  <section data-markdown><script type="text/template">
## Patriot Design
- Intended to operate only for a few hours at a time
  - But was left running for ~100 hours prior to incident
  - Designed to defend Europe from Soviet weapons
- Four 24 bit fixed-point registers (1970s design)
  - Meaning there were about 6 digits of precision after the decimal place
  - Although a `float` has somewhat similar precision
- Kept time with integer counter: incremented every 1/10 sec
- To calculate speed of incoming missile to predict future positions:
  - velocity = (loc<sub>1</sub> - loc<sub>0</sub>) / ((count<sub>1</sub> - count<sub>0</sub>) * 0.1)
- But cannot represent 0.1 exactly!
	  </script></section>

	  <section>
<h2>Floating Imprecision</h2>
<ul>
  <li>24 bits in a fixed-point register:<br>
\( 0.1 \Rightarrow \frac{1+\frac{1}{2^1}+\frac{1}{2^4}+\frac{1}{2^5}+\frac{1}{2^8}+\frac{1}{2^9}+\frac{1}{2^{12}}+\frac{1}{2^{13}}+\frac{1}{2^{16}}+\frac{1}{2^{17}}+\frac{1}{2^{20}}+\frac{1}{2^{21}}}{2^4} \\
= 209715 / 2097152 \\
= 0.099999904632568359375 \)
<ul><li>Error is 0.2/2097152 = 1/10485760</li></ul></li>
<li>One hour = 3,600 seconds = 36,000 tenths of a second<ul>
    <li>At 1/10485760 time error per tenth of a second, that yields 0.0034 seconds of time error per hour</li>
    <li>100 hours: 0.34s</li></ul></li>
<li>The Scud missile was traveling at 1,676 m/s (Mach 4.93)<ul>
    <li>1,676 m/s * 0.34 seconds = 570 meters (1,870 feet)</li>
    <li>The Patriot thus missed the target by over a half a km</li></ul></li>
</ul>
	  </section>

	  <section data-markdown><script type="text/template">
## The bug fix...
> Two weeks before the incident, Army officials received Israeli data indicating some loss in accuracy after the system had been running for 8 consecutive hours. 
> Consequently, Army officials modified the software to improve the system's accuracy. 
> However, the modified software did not reach Dhahran until February 26, 1991 --  the day after the Scud incident.
>
> \- GAO Report
	  </script></section>

	  <section data-markdown><script type="text/template">
## Better Floats: More Bits
- IEEE 754 Double Precision (64 bits)
  - A `float` has about 7 decimal places of accuracy, and a `double` has about 15
- 64 bits are split as follows:
  - bit 1: sign bit, 1 means negative (1 bit)
  - bits 2-12: exponent (11 bits)
  - bits 13-64: mantissa (52 bits)
  - Exponent offset (aka bias) is 2<sup>*e*-1</sup>-1 = 1023
- Single precision: 0.1 = 209715 / 2097152
  - Error = 9.5 \* 10<sup>-8</sup>  (at least 20 hours to miss target)
- Double precision: 
  - 0.1 = 56294995342131 / 562949953421312
  - Error = 3.608 \* 10<sup>-16</sup> (2,172,375,450 ***years*** to miss)
	  </script></section>

	  <section data-markdown><script type="text/template">
## Other Floating Points
- IEEE 754r quad-precision format
  - 128 bits (1 sign, 15 exponent, 112 mantissa)
  - About 34 decimal places of accuracy
- IBM Floating Point ("Hexadecimal")
  - Use more bits in fraction, fewer in exponent (7/24 and 7/56 instead of 8/23 and 11/52)
- Decimal Formats (IEEE 754d) 
  - 1 decimal digit into 4 binary digits
  - Cowlishaw encoding:
    - Exact representation of decimals (e.g., 0.1)
    - 3 decimal digits (0-999) into 10 binary digits (0-1023) (24 wasted out of 1024)
	  </script></section>

	  <section data-markdown><script type="text/template">
## Smaller Floating Point
- 16-bit floating point representations
  - Minifloat: 1 sign, 5-bit exponent, 10-bit mantissa
  - Range from 2.98 * 10<sup>-8</sup> to 65504
  - Your graphics card uses this (if you have a good one)

![nvidia logo](images/03-numbers/nvidia.png)
	  </script></section>

	</section>


      </div>

    </div>

    <script src='../slides/reveal.js/dist/reveal.js'></script><script src='../slides/reveal.js/plugin/zoom/zoom.js'></script><script src='../slides/reveal.js/plugin/notes/notes.js'></script><script src='../slides/reveal.js/plugin/search/search.js'></script><script src='../slides/reveal.js/plugin/markdown/markdown.js'></script><script src='../slides/reveal.js/plugin/highlight/highlight.js'></script><script src='../slides/reveal.js/plugin/math/math.js'></script>
    <script src="js/settings.js"></script>

  </body>
</html>