Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree component #155

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
a283f76
general binary tree component and test
desmonddak Jan 9, 2025
b185b1d
documentation and an experimental LogicArray binary tree
desmonddak Jan 9, 2025
b9fdf61
working binary tree
desmonddak Jan 10, 2025
30b9eb3
updated code documentation
desmonddak Jan 10, 2025
a0760ea
working reduction tree
desmonddak Jan 12, 2025
57a7f76
refactored into an iterative function
desmonddak Jan 12, 2025
d73020b
preparted to merge iter into top class
desmonddak Jan 12, 2025
1563fd8
merged ReductionTree module with internal recursion, documentation
desmonddak Jan 12, 2025
343e45f
use floating_point.md from fpmult branch
desmonddak Jan 12, 2025
7a10bfe
don't use WaveDumper in regressions
desmonddak Jan 12, 2025
8593dcc
remove binary_tree as reduction_tree covers
desmonddak Jan 12, 2025
8db0a58
reduction_tree in rohd_hcl, remove binary_tree
desmonddak Jan 12, 2025
e7a24f7
forgot to saveall in vscode -- rohd_hcl.dart fix
desmonddak Jan 12, 2025
8fec4f9
Binary tree replaced bin reduction tree.
desmonddak Jan 13, 2025
fac01f3
better use of dart maps
desmonddak Jan 13, 2025
edac48f
use a record for capturing state
desmonddak Jan 13, 2025
a366b84
improve documentation with better examples
desmonddak Jan 13, 2025
8c10cae
doc fix, more code neatening for clarity
desmonddak Jan 13, 2025
aadd721
doc cleanup
desmonddak Jan 13, 2025
9d02b42
more doc cleanup
desmonddak Jan 13, 2025
60a4f03
more documentation cleanup in code and markdown
desmonddak Jan 13, 2025
d5d32f6
added reduction_tree to README.md
desmonddak Jan 13, 2025
c43be66
Merge branch 'main' into treeComponent
desmonddak Jan 17, 2025
7393863
doc fix
desmonddak Jan 17, 2025
5a11f8a
dead link fix
desmonddak Jan 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ Some in-development items will have opened issues, as well. Feel free to create
- NoC's
- Coherent
- Non-Coherent
- [Reduction Tree](./components/reduction_tree.md)
- Memory
- [Register File](./components/memory.md#register-files)
- [Masking](./components/memory.md#masks)
Expand Down
13 changes: 7 additions & 6 deletions doc/components/floating_point.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Appropriate string representations, comparison operations, and operators are ava

### Floating Point Constants

The various IEEE constants representing corner cases of the field of floating-point values for a given size of [FloatingPointValue](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPointValue-class.html): infinities, zeros, limits for normal (e.g. mantissa in the range of $[1,2)$ and sub-normal numbers (zero exponent, and mantissa <1).
The various IEEE constants representing corner cases of the field of floating-point values for a given size of [FloatingPointValue](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPointValue-class.html): infinities, zeros, limits for normal (e.g. mantissa in the range of $[1,2)$) and sub-normal numbers (zero exponent, and mantissa <1).

For any basic arbitrary width `FloatingPointValue` ROHD-HCL supports the following constants in that format.

Expand All @@ -36,8 +36,8 @@ For any basic arbitrary width `FloatingPointValue` ROHD-HCL supports the followi
- `one`: The number one
- `smallestLargerThanOne`: Smallest number greater than one
- `largestNormal`: Largest positive number, most positive exponent, full mantissa
- `infinity`: Largest possible number: all 1s in the exponent, all 0s in the mantissa
- `nan`: Not a Number, demarked by all 1s in exponent and any 1 in mantissa (we use the LSB)
- `infinity`: Largest possible number: all 1s in the exponent, all 0s in the mantissa
- `nan`: Not a Number, designated by all 1s in exponent and any 1 in mantissa (we use the LSB)

### Special subtypes

Expand Down Expand Up @@ -73,6 +73,7 @@ A very basic [FloatingPointMultiplierSimple] component is available which does n

It has options to control its performance:

- 'radix': used to specify the radix of the Booth encoder (default radix=4: options are [2,4,8,16])'.
- adderGen': used to specify the kind of [Adder] used for key functions like the mantissa addition. Defaults to [NativeAdder], but you can select a [ParallelPrefixAdder] of your choice.
- 'ppTree': used to specify the type of ['ParallelPrefix'](https://intel.github.io/rohd-hcl/rohd_hcl/ParallelPrefix-class.html) used in the pther critical functions like leading-one detect.
- `radix`: used to specify the radix of the Booth encoder (default radix=4: options are [2,4,8,16])'.
- `adderGen`: used to specify the kind of [Adder] used for key functions like the mantissa addition. Defaults to [NativeAdder], but you can select a [ParallelPrefixAdder] of your choice.
- `seGen`: type of sign extension routine used, base class is [PartialProductSignExtension].
- `ppTree`: used to specify the type of ['ParallelPrefix'](https://intel.github.io/rohd-hcl/rohd_hcl/ParallelPrefix-class.html) used in the other critical functions like leading-one detect.
2 changes: 1 addition & 1 deletion doc/components/multiplier_components.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ row slice mult

A few things to note: first, that we are negating by ones' complement (so we need a -0) and second, these rows do not add up to (18: 10010). For Booth encoded rows to add up properly, they need to be in twos' complement form, and they need to be sign-extended.

Here is the matrix with a crude sign extension `brute` (the table formatting is available from our [PartialProductGenerator](https://intel.github.io/rohd-hcl/rohd_hcl/PartialProductGenerator-class.html) component). With twos' complementation, and sign bits folded in (note the LSB of each row has a sign term from the previous row), these addends are correctly formed and add to (18: 10010).
Here is the matrix with a crude sign extension `brute` (the table formatting is available from our [PartialProductGenerator](https://intel.github.io/rohd-hcl/rohd_hcl/PartialProductGeneratorBase-class.html) component). With twos' complementation, and sign bits folded in (note the LSB of each row has a sign term from the previous row), these addends are correctly formed and add to (18: 10010).

```text
7 6 5 4 3 2 1 0
Expand Down
53 changes: 53 additions & 0 deletions doc/components/reduction_tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Reduction Tree

The `ReductionTree` component is a general tree generator that allows for arbitrary radix or tree-branching factor in the computation. It takes a sequence of `Logic` values and performs a specified operation at each node of the tree, taking in 'radix' inputs and producing one output. If the operation widens the output (say in addition), then the `ReductionTree` will widen values using either sign-extension or zero-extension as specified.

The input sequence is provided in the form 'List\<Logic\>'. The operation must be provided in the form

```dart
Logic Function(List<Logic> operands)
```

and support operand lengths between $[2,radix]$.

The `ReductionTree` does not require the sequence length to be a power of the radix; it can be of arbitrary length.

The resulting tree can be pipelined by specifying the depth of nodes before a pipestage is added. Since the input can be of arbitrary length, paths in the tree may not be balanced, and extra pipestages will be added in shorter sections of the tree to align the computation.

Here is an example radix-4 computation tree using native addition on 79 13-bit inputs, pipelining every 2 operations deep, and producing a single 13-bit result.

```dart
Logic addReduce(List<Logic> inputs) {
final a = inputs.reduce((v, e) => v + e);
return a;
}
/// Tree reduction using addReduce
const width = 13;
const length = 79;
final vec = <Logic>[];

final reductionTree = ReductionTree(
vec, radix: 4, addReduce, clk: clk, depthToFlop; 2);
```

Here is the same example radix-4 computation tree but using prefix adders on 79 13-bit inputs, pipelining every 2 operations deep, and producing a single 21-bit result, due to width-extension of the prefix adder, adding 1 bit for each addition in 7 levels of the tree.

```dart
Logic addReduceWithAdders(List<Logic> inputs) {
if (inputs.length < 4) {
return inputs.reduce((v, e) => v + e);
} else {
final add0 = ParallelPrefixAdder(inputs[0], inputs[1]);
final add1 = ParallelPrefixAdder(inputs[2], inputs[3]);
final add2 = ParallelPrefixAdder(add0.sum, add1.sum);
return add2.sum;
}

/// Tree reduction using addReduceWithAdders
const width = 13;
const length = 79;
final vec = <Logic>[];

final reductionTree = ReductionTree(
vec, radix: 4, addReduceWithAdders, clk: clk, depthToFlop; 2, signExtend: true);
```
1 change: 1 addition & 0 deletions lib/rohd_hcl.dart
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ export 'src/find.dart';
export 'src/interfaces/interfaces.dart';
export 'src/memory/memories.dart';
export 'src/models/models.dart';
export 'src/reduction_tree.dart';
export 'src/rotate.dart';
export 'src/serialization/serialization.dart';
export 'src/shift_register.dart';
Expand Down
144 changes: 144 additions & 0 deletions lib/src/reduction_tree.dart
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
// Copyright (C) 2025 Intel Corporation
// SPDX-License-Identifier: BSD-3-Clause
//
// reduction_tree.dart
// A generator for creating tree reduction computations.
//
// 2025 January 10
// Author: Desmond A Kirkpatrick <desmond.a.kirkpatrick@intel.com

import 'dart:math';

import 'package:meta/meta.dart';
import 'package:rohd/rohd.dart';
import 'package:rohd_hcl/rohd_hcl.dart';

/// A generator which constructs a tree of radix-input / 1-output modules.
class ReductionTree extends Module {
/// The radix-sized input operation to be performed at each node.
@protected
final Logic Function(List<Logic> inputs) operation;

/// Specified width of input to each reduction node (e.g., binary: radix=2)
@protected
late final int radix;

/// When [signExtend] is true, use sign-extension on values,
/// otherwise use zero-extension.
@protected
late final bool signExtend;

/// Specified depth of nodes at which to flop (requires [clk]).
@protected
late final int? depthToFlop;

/// Optional [clk] input to create pipeline.
@protected
late final Logic? clk;

/// Optional [reset] input to reset pipeline.
@protected
late final Logic? reset;

/// Optional [enable] input to enable pipeline.
@protected
late final Logic? enable;

/// The final output of the tree computation.
Logic get out => output('out');

/// The combinational depth since the last flop. The total compute depth of
/// the tree is: depth + flopDepth * depthToflop;
int get depth => _computed.depth;

/// The flop depth of the tree from the output to the leaves.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this equal to the latency in cycles?

int get flopDepth => _computed.flopDepth;

/// Capture the record of compute: the final value, its depth (from last
/// flop or input), and its flopDepth if pipelined.
late final ({Logic value, int depth, int flopDepth}) _computed;

/// Generate a tree based on dividing the input [sequence] of a node into
/// segments, recursively constructing [radix] child nodes to operate
/// on each segment.
/// - [sequence] is the input sequence to be reduced using the tree of
/// operations.
/// - Logic Function(List<Logic> inputs) [operation] is the operation to be
/// performed at each node. Note that [operation] can widen the output. The
/// logic function must support the operation for 2 to radix inputs.
/// - [radix] is the width of reduction at each node in the tree (e.g.,
/// binary: radix=2).
/// - [signExtend] if true, use sign-extension to widen [Logic] values as
/// needed in the tree, otherwise use zero-extension (default).
///
/// Optional parameters to be used for creating a pipelined computation tree:
/// - [clk], [reset], [enable] are optionally provided to allow for flopping.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the enable like a stall?

/// - [depthToFlop] specifies how many nodes deep separate flops.
ReductionTree(List<Logic> sequence, this.operation,
{this.radix = 2,
this.signExtend = false,
this.depthToFlop,
Logic? clk,
Logic? enable,
Logic? reset,
super.name = 'reduction_tree'}) {
if (sequence.isEmpty) {
throw RohdHclException("Don't use ReductionTree "
'with an empty sequence');
}
sequence = [
for (var i = 0; i < sequence.length; i++)
addInput('seq$i', sequence[i], width: sequence[i].width)
];
this.clk = (clk != null) ? addInput('clk', clk) : null;
this.enable = (enable != null) ? addInput('enable', enable) : null;
this.reset = (reset != null) ? addInput('reset', reset) : null;

_computed = reductionTreeRecurse(sequence);
addOutput('out', width: _computed.value.width) <= _computed.value;
}

/// Local conditional flop using module reset/enable
Logic localFlop(Logic d, {bool doFlop = false}) =>
condFlop(doFlop ? clk : null, reset: reset, en: enable, d);

/// Recursively construct the computation tree
({Logic value, int depth, int flopDepth}) reductionTreeRecurse(
List<Logic> seq) {
if (seq.length < radix) {
return (value: operation(seq), depth: 0, flopDepth: 0);
} else {
final results = <({Logic value, int depth, int flopDepth})>[];
final segment = seq.length ~/ radix;
var pos = 0;
for (var i = 0; i < radix; i++) {
final c = reductionTreeRecurse(seq
.getRange(pos, (i < radix - 1) ? pos + segment : seq.length)
.toList());
results.add(c);
pos += segment;
}
final flopDepth = results.map((c) => c.flopDepth).reduce(max);
final treeDepth = results.map((c) => c.depth).reduce(max);

final alignedResults = results
.map((c) => localFlop(c.value, doFlop: c.flopDepth < flopDepth));

final depthFlop = (depthToFlop != null) &&
(treeDepth > 0) & (treeDepth % depthToFlop! == 0);
final resultsFlop =
alignedResults.map((r) => localFlop(r, doFlop: depthFlop));

final alignWidth = results.map((c) => c.value.width).reduce(max);
final resultsExtend = resultsFlop.map((r) =>
signExtend ? r.signExtend(alignWidth) : r.zeroExtend(alignWidth));

final computed = operation(resultsExtend.toList());
return (
value: computed,
depth: depthFlop ? 0 : treeDepth + 1,
flopDepth: flopDepth + (depthFlop ? 1 : 0)
);
}
}
}
Loading