-
Notifications
You must be signed in to change notification settings - Fork 43
/
Adder_Subtractor_Binary_Multiprecision_Saturating.html
399 lines (331 loc) · 14.3 KB
/
Adder_Subtractor_Binary_Multiprecision_Saturating.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
<html>
<head>
<link rel="shortcut icon" href="./favicon.ico">
<link rel="stylesheet" type="text/css" href="./style.css">
<link rel="canonical" href="./Adder_Subtractor_Binary_Multiprecision_Saturating.html">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="A signed staturating binary integer adder/subtractor, which does arithmetic over `WORD_WIDTH` words as a sequence of smaller `STEP_WORD_WIDTH` operations to save area and increase operating speed at the price of extra latency.">
<title>Adder Subtractor Binary Multiprecision Saturating</title>
</head>
<body>
<p class="inline bordered"><b><a href="./Adder_Subtractor_Binary_Multiprecision_Saturating.v">Source</a></b></p>
<p class="inline bordered"><b><a href="./legal.html">License</a></b></p>
<p class="inline bordered"><b><a href="./index.html">Index</a></b></p>
<h1>Multiprecision Binary Integer Adder/Subtractor, with Saturation</h1>
<p>A signed staturating binary integer adder/subtractor, which does arithmetic over
<code>WORD_WIDTH</code> words as a sequence of smaller <code>STEP_WORD_WIDTH</code> operations to
save area and increase operating speed at the price of extra latency.</p>
<p>The main use of this circuit is to avoid CAD problems which can emerge at
large integer bit widths (e.g.: 128 bits): the area of added pipeline
registers would become quite large, and the CAD tools can fail to retime
them completely into the extra-wide adder logic, thus failing to reach
a high operating frequency and slowing down the rest of the system. <em>There
are notes in the code which point out the critical paths you can expect to
see (or not).</em></p>
<p>Also, if we don't need a result every cycle, regular pipelining is
wasteful: it unnecessarily duplicates the input/output data storage, and
makes poor use of the adder/subtractor logic (e.g.: the least-significant
bits are used once, then sit idle for multiple cycles while the higher bits
are computed).</p>
<p>Since the result latency depends on the ratio of <code>WORD_WIDTH</code> to
<code>STEP_WORD_WIDTH</code> and whether that ratio is a whole integer, the inputs are
set with a ready/valid handshake, and can be updated after the output
handshake completes.</p>
<p>Addition/subtraction is selected with <code>add_sub</code>: 0 for an add (<code>A+B</code>), and
1 for a subtract (<code>A-B</code>). This assignment conveniently matches the
convention of sign bits. <em>Note that the <code>overflow</code> bit is only meaningful
for signed numbers. For unsigned numbers, use <code>carry_out</code> instead.</em></p>
<h2>Saturation</h2>
<p>If the result of the addition/subtraction falls outside of the inclusive
minimum or maximum limits, the result is clipped (saturated) to the nearest
exceeded limit. <strong>The maximum limit must be greater or equal than the
minimum limit.</strong> If the limits are reversed, such that limit_max
< limit_min, the result will be meaningless.</p>
<p>Internally, we perform the addition/subtraction on WORD_WIDTH + 1 bits.
Since the limits must be within the range of WORD_WIDTH-wide numbers, there
can never be an overflow or underflow. Instead, we signal if we have
reached or would have exceeded the limits at the last incrementation. The
saturation logic is a pair of simple signed comparisons in the larger
range. This is also likely optimal, as the delay from one extra bit of
carry is less than that of any extra logic to handle overflows.</p>
<p>Also, we internally perform the addition/subtraction as unsigned so we can
easily handle the carry_in bit. The signed comparisons are done in
a separate module which implements signed/unsigned comparisons as raw
logic, to avoid having to make sure all compared values are declared
signed, else the comparison silently defaults to unsigned!</p>
<h2>Ports and Constants</h2>
<pre>
`default_nettype none
module <a href="./Adder_Subtractor_Binary_Multiprecision_Saturating.html">Adder_Subtractor_Binary_Multiprecision_Saturating</a>
#(
parameter WORD_WIDTH = 0,
parameter STEP_WORD_WIDTH = 0,
parameter EXTRA_PIPE_STAGES = 0 // Use if predicates are critical path
)
(
input wire clock,
input wire clock_enable,
input wire clear,
input wire input_valid,
output wire input_ready,
input wire [WORD_WIDTH-1:0] limit_max,
input wire [WORD_WIDTH-1:0] limit_min,
input wire add_sub, // 0/1 -> A+B/A-B
input wire [WORD_WIDTH-1:0] A,
input wire [WORD_WIDTH-1:0] B,
output wire output_valid,
input wire output_ready,
output reg [WORD_WIDTH-1:0] sum,
output reg carry_out,
output reg [WORD_WIDTH-1:0] carries,
output wire at_limit_max,
output wire over_limit_max,
output wire at_limit_min,
output wire under_limit_min
);
localparam WORD_ZERO = {WORD_WIDTH{1'b0}};
localparam WORD_WIDTH_EXTENDED = WORD_WIDTH + 1;
localparam WORD_ZERO_EXTENDED = {WORD_WIDTH_EXTENDED{1'b0}};
initial begin
sum = WORD_ZERO;
carry_out = 1'b0;
carries = WORD_ZERO;
end
</pre>
<p>Extend the inputs to prevent overflow over their original range. We extend
them as signed integers, despite declaring them as unsigned.</p>
<pre>
wire [WORD_WIDTH_EXTENDED-1:0] A_extended;
<a href="./Width_Adjuster.html">Width_Adjuster</a>
#(
.WORD_WIDTH_IN (WORD_WIDTH),
.SIGNED (1),
.WORD_WIDTH_OUT (WORD_WIDTH_EXTENDED)
)
extend_A
(
.original_input (A),
.adjusted_output (A_extended)
);
wire [WORD_WIDTH_EXTENDED-1:0] B_extended;
<a href="./Width_Adjuster.html">Width_Adjuster</a>
#(
.WORD_WIDTH_IN (WORD_WIDTH),
.SIGNED (1),
.WORD_WIDTH_OUT (WORD_WIDTH_EXTENDED)
)
extend_B
(
.original_input (B),
.adjusted_output (B_extended)
);
</pre>
<p>Extend the limits in the same way, as if signed integers. </p>
<pre>
wire [WORD_WIDTH_EXTENDED-1:0] limit_max_extended;
<a href="./Width_Adjuster.html">Width_Adjuster</a>
#(
.WORD_WIDTH_IN (WORD_WIDTH),
.SIGNED (1),
.WORD_WIDTH_OUT (WORD_WIDTH_EXTENDED)
)
extend_limit_max
(
.original_input (limit_max),
.adjusted_output (limit_max_extended)
);
wire [WORD_WIDTH_EXTENDED-1:0] limit_min_extended;
<a href="./Width_Adjuster.html">Width_Adjuster</a>
#(
.WORD_WIDTH_IN (WORD_WIDTH),
.SIGNED (1),
.WORD_WIDTH_OUT (WORD_WIDTH_EXTENDED)
)
extend_limit_min
(
.original_input (limit_min),
.adjusted_output (limit_min_extended)
);
</pre>
<p>Then select and perform the addition or subtraction in the usual way.
NOTE: we don't capture the extended <code>carry_out</code>, as it will never be set
properly since the inputs are too small for the <code>WORD_WIDTH_EXTENDED</code>. We
compute the real <code>carry_out</code> separately.</p>
<pre>
wire [WORD_WIDTH_EXTENDED-1:0] sum_extended;
wire [WORD_WIDTH_EXTENDED-1:0] carries_extended;
wire output_valid_addsub;
// Correct, but confuses Veril*tor
// verilator lint_off UNOPTFLAT
wire output_ready_addsub;
// verilator lint_on UNOPTFLAT
<a href="./Adder_Subtractor_Binary_Multiprecision.html">Adder_Subtractor_Binary_Multiprecision</a>
#(
.WORD_WIDTH (WORD_WIDTH_EXTENDED),
.STEP_WORD_WIDTH (STEP_WORD_WIDTH)
)
extended_add_sub
(
.clock (clock),
.clock_enable (clock_enable),
.clear (clear),
.input_valid (input_valid),
.input_ready (input_ready),
.add_sub (add_sub), // 0/1 -> A+B/A-B
.A (A_extended),
.B (B_extended),
.output_valid (output_valid_addsub),
.output_ready (output_ready_addsub),
// verilator lint_off PINCONNECTEMPTY
.sum (sum_extended),
.carry_out (),
.carries (carries_extended),
.overflow ()
// verilator lint_on PINCONNECTEMPTY
);
</pre>
<p>Since we extended the width by one bit, the original <code>carry_out</code> is now the
carry into that extra bit. Let's also get the original <code>carries</code> into each
bit.</p>
<pre>
always @(*) begin
carry_out = carries_extended [WORD_WIDTH_EXTENDED-1];
carries = carries_extended [WORD_WIDTH-1:0];
end
</pre>
<p>Check if <code>sum_extended</code> is past the min/max limits. Using these arithmetic
predicate modules removes the need to get all the signed declarations
correct, else we accidentally and silently fall back to unsigned
comparisons!</p>
<pre>
wire input_valid_max;
wire input_ready_max;
wire input_valid_min;
wire input_ready_min;
wire [WORD_WIDTH_EXTENDED-1:0] sum_extended_max;
wire [WORD_WIDTH_EXTENDED-1:0] sum_extended_min;
</pre>
<p>Since we are doing multi-cycle calculations here, we could multiplex one
predicates module, but that's more stateful and complex. So let's run two
in parallel, as usual, and the fork/join takes care of all the
synchronization and data handling.</p>
<pre>
localparam LIMIT_CHECK_COUNT = 2;
<a href="./Pipeline_Fork_Eager.html">Pipeline_Fork_Eager</a>
#(
.WORD_WIDTH (WORD_WIDTH_EXTENDED),
.OUTPUT_COUNT (LIMIT_CHECK_COUNT)
)
fork_to_limit_checks
(
.clock (clock),
.clear (clear),
.input_valid (output_valid_addsub),
.input_ready (output_ready_addsub),
.input_data (sum_extended),
.output_valid ({input_valid_max, input_valid_min}),
.output_ready ({input_ready_max, input_ready_min}),
.output_data ({sum_extended_max, sum_extended_min})
);
wire output_valid_max;
wire output_ready_max;
wire at_limit_max_raw;
wire over_limit_max_raw;
<a href="./Arithmetic_Predicates_Binary_Multiprecision.html">Arithmetic_Predicates_Binary_Multiprecision</a>
#(
.WORD_WIDTH (WORD_WIDTH_EXTENDED),
.STEP_WORD_WIDTH (STEP_WORD_WIDTH),
.PIPELINE_DEPTH (EXTRA_PIPE_STAGES)
)
limit_max_check
(
.clock (clock),
.clock_enable (clock_enable),
.clear (clear),
.input_valid (input_valid_max),
.input_ready (input_ready_max),
.A (sum_extended_max),
.B (limit_max_extended),
.output_valid (output_valid_max),
.output_ready (output_ready_max),
// verilator lint_off PINCONNECTEMPTY
.A_eq_B (at_limit_max_raw),
.A_lt_B_unsigned (),
.A_lte_B_unsigned (),
.A_gt_B_unsigned (),
.A_gte_B_unsigned (),
.A_lt_B_signed (),
.A_lte_B_signed (),
.A_gt_B_signed (over_limit_max_raw),
.A_gte_B_signed ()
// verilator lint_on PINCONNECTEMPTY
);
wire output_valid_min;
wire output_ready_min;
wire at_limit_min_raw;
wire under_limit_min_raw;
<a href="./Arithmetic_Predicates_Binary_Multiprecision.html">Arithmetic_Predicates_Binary_Multiprecision</a>
#(
.WORD_WIDTH (WORD_WIDTH_EXTENDED),
.STEP_WORD_WIDTH (STEP_WORD_WIDTH),
.PIPELINE_DEPTH (EXTRA_PIPE_STAGES)
)
limit_min_check
(
.clock (clock),
.clock_enable (clock_enable),
.clear (clear),
.input_valid (input_valid_min),
.input_ready (input_ready_min),
.A (sum_extended_min),
.B (limit_min_extended),
.output_valid (output_valid_min),
.output_ready (output_ready_min),
// verilator lint_off PINCONNECTEMPTY
.A_eq_B (at_limit_min_raw),
.A_lt_B_unsigned (),
.A_lte_B_unsigned (),
.A_gt_B_unsigned (),
.A_gte_B_unsigned (),
.A_lt_B_signed (under_limit_min_raw),
.A_lte_B_signed (),
.A_gt_B_signed (),
.A_gte_B_signed ()
// verilator lint_on PINCONNECTEMPTY
);
localparam LIMIT_COUNT = 1 + 1;
<a href="./Pipeline_Join.html">Pipeline_Join</a>
#(
.WORD_WIDTH (LIMIT_COUNT),
.INPUT_COUNT (LIMIT_CHECK_COUNT)
)
join_from_limit_checks
(
.clock (clock),
.clear (clear),
.input_valid ({output_valid_max, output_valid_min}),
.input_ready ({output_ready_max, output_ready_min}),
.input_data ({at_limit_max_raw, over_limit_max_raw, at_limit_min_raw, under_limit_min_raw}),
.output_valid (output_valid),
.output_ready (output_ready),
.output_data ({at_limit_max, over_limit_max, at_limit_min, under_limit_min})
);
</pre>
<p>After, clip the sum to the limits. This must be done as a signed comparison
so we can place the limits anywhere in the positive or negative integers,
so long as <code>limit_max >= limit_min</code>, as signed integers. And finally,
truncate the output back to the input <code>WORD_WIDTH</code>.</p>
<pre>
reg [WORD_WIDTH_EXTENDED-1:0] sum_extended_clipped = WORD_ZERO_EXTENDED;
always @(*) begin
sum_extended_clipped = (over_limit_max == 1'b1) ? limit_max_extended : sum_extended;
sum_extended_clipped = (under_limit_min == 1'b1) ? limit_min_extended : sum_extended_clipped;
sum = sum_extended_clipped [WORD_WIDTH-1:0];
end
endmodule
</pre>
<hr>
<p><a href="./index.html">Back to FPGA Design Elements</a>
<center><a href="https://fpgacpu.ca/">fpgacpu.ca</a></center>
</body>
</html>