diff --git a/doc/README.md b/doc/README.md index 8dcf21fd..3cf1d52c 100644 --- a/doc/README.md +++ b/doc/README.md @@ -95,6 +95,7 @@ Some in-development items will have opened issues, as well. Feel free to create - NoC's - Coherent - Non-Coherent + - [Reduction Tree](./components/reduction_tree.md) - Memory - [Register File](./components/memory.md#register-files) - [Masking](./components/memory.md#masks) diff --git a/doc/components/floating_point.md b/doc/components/floating_point.md index 3a97e5f7..641df231 100644 --- a/doc/components/floating_point.md +++ b/doc/components/floating_point.md @@ -22,7 +22,7 @@ Appropriate string representations, comparison operations, and operators are ava ### Floating Point Constants -The various IEEE constants representing corner cases of the field of floating-point values for a given size of [FloatingPointValue](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPointValue-class.html): infinities, zeros, limits for normal (e.g. mantissa in the range of $[1,2)$ and sub-normal numbers (zero exponent, and mantissa <1). +The various IEEE constants representing corner cases of the field of floating-point values for a given size of [FloatingPointValue](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPointValue-class.html): infinities, zeros, limits for normal (e.g. mantissa in the range of $[1,2)$) and sub-normal numbers (zero exponent, and mantissa <1). For any basic arbitrary width `FloatingPointValue` ROHD-HCL supports the following constants in that format. @@ -36,8 +36,8 @@ For any basic arbitrary width `FloatingPointValue` ROHD-HCL supports the followi - `one`: The number one - `smallestLargerThanOne`: Smallest number greater than one - `largestNormal`: Largest positive number, most positive exponent, full mantissa -- `infinity`: Largest possible number: all 1s in the exponent, all 0s in the mantissa -- `nan`: Not a Number, demarked by all 1s in exponent and any 1 in mantissa (we use the LSB) +- `infinity`: Largest possible number: all 1s in the exponent, all 0s in the mantissa +- `nan`: Not a Number, designated by all 1s in exponent and any 1 in mantissa (we use the LSB) ### Special subtypes @@ -73,6 +73,7 @@ A very basic [FloatingPointMultiplierSimple] component is available which does n It has options to control its performance: -- 'radix': used to specify the radix of the Booth encoder (default radix=4: options are [2,4,8,16])'. -- adderGen': used to specify the kind of [Adder] used for key functions like the mantissa addition. Defaults to [NativeAdder], but you can select a [ParallelPrefixAdder] of your choice. -- 'ppTree': used to specify the type of ['ParallelPrefix'](https://intel.github.io/rohd-hcl/rohd_hcl/ParallelPrefix-class.html) used in the pther critical functions like leading-one detect. +- `radix`: used to specify the radix of the Booth encoder (default radix=4: options are [2,4,8,16])'. +- `adderGen`: used to specify the kind of [Adder] used for key functions like the mantissa addition. Defaults to [NativeAdder], but you can select a [ParallelPrefixAdder] of your choice. +- `seGen`: type of sign extension routine used, base class is [PartialProductSignExtension]. +- `ppTree`: used to specify the type of ['ParallelPrefix'](https://intel.github.io/rohd-hcl/rohd_hcl/ParallelPrefix-class.html) used in the other critical functions like leading-one detect. diff --git a/doc/components/multiplier_components.md b/doc/components/multiplier_components.md index ed5f2c60..fe73ee03 100644 --- a/doc/components/multiplier_components.md +++ b/doc/components/multiplier_components.md @@ -51,7 +51,7 @@ row slice mult A few things to note: first, that we are negating by ones' complement (so we need a -0) and second, these rows do not add up to (18: 10010). For Booth encoded rows to add up properly, they need to be in twos' complement form, and they need to be sign-extended. - Here is the matrix with a crude sign extension `brute` (the table formatting is available from our [PartialProductGenerator](https://intel.github.io/rohd-hcl/rohd_hcl/PartialProductGenerator-class.html) component). With twos' complementation, and sign bits folded in (note the LSB of each row has a sign term from the previous row), these addends are correctly formed and add to (18: 10010). + Here is the matrix with a crude sign extension `brute` (the table formatting is available from our [PartialProductGenerator](https://intel.github.io/rohd-hcl/rohd_hcl/PartialProductGeneratorBase-class.html) component). With twos' complementation, and sign bits folded in (note the LSB of each row has a sign term from the previous row), these addends are correctly formed and add to (18: 10010). ```text 7 6 5 4 3 2 1 0 diff --git a/doc/components/reduction_tree.md b/doc/components/reduction_tree.md new file mode 100644 index 00000000..3acc52bc --- /dev/null +++ b/doc/components/reduction_tree.md @@ -0,0 +1,53 @@ +# Reduction Tree + +The `ReductionTree` component is a general tree generator that allows for arbitrary radix or tree-branching factor in the computation. It takes a sequence of `Logic` values and performs a specified operation at each node of the tree, taking in 'radix' inputs and producing one output. If the operation widens the output (say in addition), then the `ReductionTree` will widen values using either sign-extension or zero-extension as specified. + +The input sequence is provided in the form 'List\'. The operation must be provided in the form + + ```dart + Logic Function(List operands) + ``` + + and support operand lengths between $[2,radix]$. + +The `ReductionTree` does not require the sequence length to be a power of the radix; it can be of arbitrary length. + +The resulting tree can be pipelined by specifying the depth of nodes before a pipestage is added. Since the input can be of arbitrary length, paths in the tree may not be balanced, and extra pipestages will be added in shorter sections of the tree to align the computation. + +Here is an example radix-4 computation tree using native addition on 79 13-bit inputs, pipelining every 2 operations deep, and producing a single 13-bit result. + +```dart + Logic addReduce(List inputs) { + final a = inputs.reduce((v, e) => v + e); + return a; + } + /// Tree reduction using addReduce + const width = 13; + const length = 79; + final vec = []; + + final reductionTree = ReductionTree( + vec, radix: 4, addReduce, clk: clk, depthToFlop; 2); + ``` + +Here is the same example radix-4 computation tree but using prefix adders on 79 13-bit inputs, pipelining every 2 operations deep, and producing a single 21-bit result, due to width-extension of the prefix adder, adding 1 bit for each addition in 7 levels of the tree. + + ```dart + Logic addReduceWithAdders(List inputs) { + if (inputs.length < 4) { + return inputs.reduce((v, e) => v + e); + } else { + final add0 = ParallelPrefixAdder(inputs[0], inputs[1]); + final add1 = ParallelPrefixAdder(inputs[2], inputs[3]); + final add2 = ParallelPrefixAdder(add0.sum, add1.sum); + return add2.sum; + } + + /// Tree reduction using addReduceWithAdders + const width = 13; + const length = 79; + final vec = []; + + final reductionTree = ReductionTree( + vec, radix: 4, addReduceWithAdders, clk: clk, depthToFlop; 2, signExtend: true); + ``` diff --git a/lib/rohd_hcl.dart b/lib/rohd_hcl.dart index b12b6ead..52c8ba36 100644 --- a/lib/rohd_hcl.dart +++ b/lib/rohd_hcl.dart @@ -17,6 +17,7 @@ export 'src/find.dart'; export 'src/interfaces/interfaces.dart'; export 'src/memory/memories.dart'; export 'src/models/models.dart'; +export 'src/reduction_tree.dart'; export 'src/rotate.dart'; export 'src/serialization/serialization.dart'; export 'src/shift_register.dart'; diff --git a/lib/src/reduction_tree.dart b/lib/src/reduction_tree.dart new file mode 100644 index 00000000..7764632a --- /dev/null +++ b/lib/src/reduction_tree.dart @@ -0,0 +1,144 @@ +// Copyright (C) 2025 Intel Corporation +// SPDX-License-Identifier: BSD-3-Clause +// +// reduction_tree.dart +// A generator for creating tree reduction computations. +// +// 2025 January 10 +// Author: Desmond A Kirkpatrick inputs) operation; + + /// Specified width of input to each reduction node (e.g., binary: radix=2) + @protected + late final int radix; + + /// When [signExtend] is true, use sign-extension on values, + /// otherwise use zero-extension. + @protected + late final bool signExtend; + + /// Specified depth of nodes at which to flop (requires [clk]). + @protected + late final int? depthToFlop; + + /// Optional [clk] input to create pipeline. + @protected + late final Logic? clk; + + /// Optional [reset] input to reset pipeline. + @protected + late final Logic? reset; + + /// Optional [enable] input to enable pipeline. + @protected + late final Logic? enable; + + /// The final output of the tree computation. + Logic get out => output('out'); + + /// The combinational depth since the last flop. The total compute depth of + /// the tree is: depth + flopDepth * depthToflop; + int get depth => _computed.depth; + + /// The flop depth of the tree from the output to the leaves. + int get flopDepth => _computed.flopDepth; + + /// Capture the record of compute: the final value, its depth (from last + /// flop or input), and its flopDepth if pipelined. + late final ({Logic value, int depth, int flopDepth}) _computed; + + /// Generate a tree based on dividing the input [sequence] of a node into + /// segments, recursively constructing [radix] child nodes to operate + /// on each segment. + /// - [sequence] is the input sequence to be reduced using the tree of + /// operations. + /// - Logic Function(List inputs) [operation] is the operation to be + /// performed at each node. Note that [operation] can widen the output. The + /// logic function must support the operation for 2 to radix inputs. + /// - [radix] is the width of reduction at each node in the tree (e.g., + /// binary: radix=2). + /// - [signExtend] if true, use sign-extension to widen [Logic] values as + /// needed in the tree, otherwise use zero-extension (default). + /// + /// Optional parameters to be used for creating a pipelined computation tree: + /// - [clk], [reset], [enable] are optionally provided to allow for flopping. + /// - [depthToFlop] specifies how many nodes deep separate flops. + ReductionTree(List sequence, this.operation, + {this.radix = 2, + this.signExtend = false, + this.depthToFlop, + Logic? clk, + Logic? enable, + Logic? reset, + super.name = 'reduction_tree'}) { + if (sequence.isEmpty) { + throw RohdHclException("Don't use ReductionTree " + 'with an empty sequence'); + } + sequence = [ + for (var i = 0; i < sequence.length; i++) + addInput('seq$i', sequence[i], width: sequence[i].width) + ]; + this.clk = (clk != null) ? addInput('clk', clk) : null; + this.enable = (enable != null) ? addInput('enable', enable) : null; + this.reset = (reset != null) ? addInput('reset', reset) : null; + + _computed = reductionTreeRecurse(sequence); + addOutput('out', width: _computed.value.width) <= _computed.value; + } + + /// Local conditional flop using module reset/enable + Logic localFlop(Logic d, {bool doFlop = false}) => + condFlop(doFlop ? clk : null, reset: reset, en: enable, d); + + /// Recursively construct the computation tree + ({Logic value, int depth, int flopDepth}) reductionTreeRecurse( + List seq) { + if (seq.length < radix) { + return (value: operation(seq), depth: 0, flopDepth: 0); + } else { + final results = <({Logic value, int depth, int flopDepth})>[]; + final segment = seq.length ~/ radix; + var pos = 0; + for (var i = 0; i < radix; i++) { + final c = reductionTreeRecurse(seq + .getRange(pos, (i < radix - 1) ? pos + segment : seq.length) + .toList()); + results.add(c); + pos += segment; + } + final flopDepth = results.map((c) => c.flopDepth).reduce(max); + final treeDepth = results.map((c) => c.depth).reduce(max); + + final alignedResults = results + .map((c) => localFlop(c.value, doFlop: c.flopDepth < flopDepth)); + + final depthFlop = (depthToFlop != null) && + (treeDepth > 0) & (treeDepth % depthToFlop! == 0); + final resultsFlop = + alignedResults.map((r) => localFlop(r, doFlop: depthFlop)); + + final alignWidth = results.map((c) => c.value.width).reduce(max); + final resultsExtend = resultsFlop.map((r) => + signExtend ? r.signExtend(alignWidth) : r.zeroExtend(alignWidth)); + + final computed = operation(resultsExtend.toList()); + return ( + value: computed, + depth: depthFlop ? 0 : treeDepth + 1, + flopDepth: flopDepth + (depthFlop ? 1 : 0) + ); + } + } +} diff --git a/test/reduction_tree_test.dart b/test/reduction_tree_test.dart new file mode 100644 index 00000000..00eae5da --- /dev/null +++ b/test/reduction_tree_test.dart @@ -0,0 +1,157 @@ +// Copyright (C) 2025 Intel Corporation +// SPDX-License-Identifier: BSD-3-Clause +// +// reduction_tree_test.dart +// Tests of the ReductionTreeNode generator. +// +// 2025 January 8 +// Author: Desmond A Kirkpatrick inputs) { + if (inputs.length < 4) { + return inputs.reduce((v, e) => v + e); + } else { + final add0 = ParallelPrefixAdder(inputs[0], inputs[1]); + final add1 = ParallelPrefixAdder(inputs[2], inputs[3]); + final add2 = ParallelPrefixAdder(add0.sum, add1.sum); + return add2.sum; + } +} + +void main() { + tearDown(() async { + await Simulator.reset(); + }); + Logic addReduce(List inputs) { + final a = inputs.reduce((v, e) => v + e); + return a; + } + + test('reduction tree of add operations -- quick test', () async { + const width = 13; + const length = 79; + final vec = []; + // First sum will be length *(length-1) /2 + var count = 0; + for (var i = 0; i < length; i++) { + vec.add(Const(i, width: width)); + count = count + i; + } + for (var reduce = 2; reduce < length; reduce++) { + final prefixAdd = ReductionTree(vec, radix: reduce, addReduce); + expect(prefixAdd.out.value.toInt(), equals(count)); + } + }); + test('reduction tree of adders -- large', () async { + final clk = SimpleClockGenerator(10).clk; + + const width = 17; + const length = 290; + final vec = []; + // First sum will be length *(length-1) /2 + var count = 0; + for (var i = 0; i < length; i++) { + vec.add(Const(i, width: width)); + count = count + i; + } + for (var reduce = 2; reduce < length; reduce++) { + final prefixAdd = ReductionTree(vec, radix: reduce, addReduce, clk: clk); + expect(prefixAdd.out.value.toInt(), equals(count)); + } + }); + + test('reduction tree of adders -- large, pipelined', () async { + final clk = SimpleClockGenerator(10).clk; + + const width = 17; + const length = 290; + final vec = []; + // First sum will be length *(length-1) /2 + for (var i = 0; i < length; i++) { + vec.add(Const(i, width: width)); + } + const reduce = 4; + final prefixAdd = + ReductionTree(vec, radix: reduce, addReduce, clk: clk, depthToFlop: 1); + + await prefixAdd.build(); + unawaited(Simulator.run()); + var cycles = 0; + await clk.nextNegedge; + cycles++; + // second sum will be length + for (var i = 0; i < length; i++) { + vec[i].inject(1); + } + await clk.nextNegedge; + cycles++; + // third sum will be length *2 + for (var i = 0; i < length; i++) { + vec[i].inject(2); + } + if (prefixAdd.flopDepth > cycles) { + await clk.waitCycles(prefixAdd.flopDepth - cycles); + await clk.nextNegedge; + } + expect(prefixAdd.out.value.toInt(), equals(length * (length - 1) / 2)); + await clk.nextNegedge; + expect(prefixAdd.out.value.toInt(), equals(length)); + await clk.nextNegedge; + expect(prefixAdd.out.value.toInt(), equals(length * 2)); + await clk.nextNegedge; + await clk.nextNegedge; + await clk.nextNegedge; + await Simulator.endSimulation(); + }); + + test('reduction tree of prefix adders -- large, pipelined, radix 4', + () async { + final clk = SimpleClockGenerator(10).clk; + + const width = 17; + const length = 290; + final vec = []; + // First sum will be length *(length-1) /2 + for (var i = 0; i < length; i++) { + vec.add(Const(i, width: width)); + } + const reduce = 4; + final prefixAdd = ReductionTree( + vec, radix: reduce, addReduceAdders, clk: clk, depthToFlop: 1); + + await prefixAdd.build(); + unawaited(Simulator.run()); + var cycles = 0; + await clk.nextNegedge; + cycles++; + // second sum will be length + for (var i = 0; i < length; i++) { + vec[i].inject(1); + } + await clk.nextNegedge; + cycles++; + // third sum will be length *2 + for (var i = 0; i < length; i++) { + vec[i].inject(2); + } + if (prefixAdd.flopDepth > cycles) { + await clk.waitCycles(prefixAdd.flopDepth - cycles); + await clk.nextNegedge; + } + expect(prefixAdd.out.value.toInt(), equals(length * (length - 1) / 2)); + await clk.nextNegedge; + expect(prefixAdd.out.value.toInt(), equals(length)); + await clk.nextNegedge; + expect(prefixAdd.out.value.toInt(), equals(length * 2)); + await clk.nextNegedge; + await clk.nextNegedge; + await clk.nextNegedge; + await Simulator.endSimulation(); + }); +}